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Preface 


The AIX and RS/6000 Certifications offered through the Professional 
Certification Program from IBM are designed to validate the skills required of 
technical professionals who work in the powerful and often complex 
environments of AIX and RS/6000. A complete set of professional 
certifications are available. They include: 

• IBM Certified AIX User 

• IBM Certified Specialist - RS/6000 Solution Sales 

• IBM Certified Specialist - AIX System Administration 

• IBM Certified Specialist - AIX System Support 

• IBM Certified Specialist - RS/6000 SP 

• IBM Certified Specialist - RS/6000 SP and PSSP V3 

• RS/6000 SP - Sales Qualification 

• IBM Certified Specialist - AIX HACMP 

• IBM Certified Specialist - Domino for RS/6000 

• IBM Certified Specialist - Web Server for RS/6000 

• IBM Certified Specialist - Business Intelligence for RS/6000 

• IBM Certified Advanced Technical Expert - RS/6000 AIX 

Each certification is developed by following a thorough and rigorous process 
to ensure the exam is applicable to the job role and is a meaningful and 
appropriate assessment of skill. Subject matter experts who successfully 
perform the job participate throughout the entire development process. These 
job incumbents bring a wealth of experience into the development process, 
thus, making the exams much more meaningful than the typical test that only 
captures classroom knowledge. These experienced subject matter experts 
ensure the exams are relevant to the real world and that the test content is 
both useful and valid. The result is a certification of value, which appropriately 
measures the skill required to perform the job role. 

This redbook is designed as a study guide for professionals wishing to 
prepare for the certification exam to achieve IBM Certified Specialist - 
RS/6000 SP. 

The RS/6000 SP specialist certification validates the skills required to install 
and configure RS/6000 Scalable POWERparallel (SP) system software and 
to perform the administrative and diagnostic activities needed to support 
multiple users in an SP environment. The certification is applicable to 
specialist who implement and/or support RS/6000 SP systems. 

This redbook helps RS/6000 SP specialists seeking a comprehensive and 
task-oriented guide for developing the knowledge and skills required for 


© Copyright IBM Corp. 2000 


xxi 




certification. It is designed to provide a combination of theory and practical 
experience needed for a general understanding of the subject matter. It also 
provides sample questions that will help in the evaluation of personal 
progress and provides familiarity with the types of questions that will be 
encountered in the exam. 

This redbook will not replace the practical experience you should have. 
Instead, it is an effective tool that, when combined with education activities 
and experience, should prove to be a very useful preparation guide for the 
exam. Due to the practical nature of the certification content, this publication 
can also be used as a desk-side reference. So, whether you are planning to 
take the RS/6000 SP and PSSP exam, or if you just want to validate your 
RS/6000 SP skills, this book is for you. 

For additional information about certification and instructions on how to 
register for an exam, call IBM at 1 -800-426-8322 or visit the IBM Certification 
Web Site at: http://www.ibm.com/certify 
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Chapter 1. Introduction 


This guide is not a replacement for the SP product documentation or existing 
ITSO redbooks or to the value of real experience installing and configuring 
RS/6000 SP environments. 

RS/6000 SP knowledge only is not sufficient to pass the exam. Basic AIX and 
AIX admin skills are also required. 

You are supposed to be fluent with all topics addressed in this redbook before 
taking the exam. If you do not feel confident with your skills in one of these 
topics, you should go to the referred documentation listed in each chapter. 

The RS/6000 SP Certification exam is divided into two sections: 

Section One - Is a series of general SP and PSSP related questions. 

Section Two - Is based on a scenario in a customer environment that begins 
with a basic SP configuration. In this scenario, as the customers 
requirements evolve, so does the SP configuration. As the scenario develops, 
additional partitions, nodes, frames, and system upgrades are required. 

In order to prepare you for both sections, we have included a section in each 
chapter that lists the key concepts that should be understood before taking 
the exam as well as a similar scenario where all the chapters in the redbook 
refer to. This scenario is described in 1.2, “The test scenario” on page 2. 


1.1 Book organization 

This guide will present you with all domains in the scope of the RS/6000 SP 
Certification exam. The structure of the book follows the normal flow that a 
standard RS/6000 SP installation may have. 

Part 1, “System planning” on page 5, contains chapters dedicated to the initial 
planning, as well as to the initial setup, of a standard RS/6000 SP 
environment. It also includes concepts and examples about SP security and 
user management. 

Part 2, “Installation and configuration” on page 249, contains chapters 
describing the actual implementation of the different steps for installing and 
configuring the control workstation, nodes, and switches. It also includes a 
chapter for system verification as a post-installation activity. 
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Part 3, “Application enablement” on page 313, contains chapters for the 
planning and configuration of additional products that are present in most of 
the RS/6000 SP installations. This includes the IBM Virtual Shared Disk and 
the IBM Recoverable Virtual Shared Disk, as well as GPFS, and a section 
dedicated to problem management tools available in PSSP. 

Part 4, “On-going support” on page 387, contains chapters dedicated to 
software maintenance, system reconfiguration including migration, and 
problem determination procedures and checklists. 

Each chapter is organized as follows: 

• Introduction - This contains a brief overview and set of goals for the 
chapter. 

• Key concepts you should study - This section provides a list of concepts 
that need to be understood before taking the exam. 

• Main section - This contains the body of the chapter. 

• Related documentation - Contains a comprehensive list of references to 
SP manuals and redbooks with specific pointers to the chapters and 
sections covering the concepts in the chapter. 

• Sample questions - A set of questions that serve two purposes. First is to 
check your progress with the topics covered in the chapter. Second is to 
become familiar with the type of questions you may encounter in the exam. 

• Exercises - The purpose of the exercise questions is to further explore and 
develop areas covered in the chapter. 

There are many ways to perform the same action in an SP environment: 
Command line, SMIT or SMITTY, spmon -g (PSSP 2.4 or below), IBM SP 
Perspectives, and so on. The certification exam is not restricted to one of 
these methods. You are supposed to know each one, in particular, the syntax 
of the most useful commands. 


1.2 The test scenario 

As a way to present you with a similar situation to the one you may encounter 
in the SP Certification exam, we have included a test scenario that we will 
use in all sections of this study guide. The scenario is depicted in Figure 1 on 
page 3. 

We will start with the first frame (Frame 1) and 11 nodes, and then we will add 
a second frame (Frame 2) later on when we discuss reconfiguration in Part 3. 
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Figure 1. Study guide test environment 

The environment is fairly complex in the sense that we have defined two 
Ethernet segments and a boot/install server (BIS) to better support our future 
expansion to a second frame where we will add a third Ethernet segment and 
an additional boot/install server for the second frame. 

Although, strictly speaking, we should not need multiple Ethernet segments 
for our scenario, we have decided to include multiple segments in order to 
introduce an environment where networking, and especially routing, has to be 
considered. Details about networking can be found in Chapter 3, “FtS/6000 
SP networking” on page 75. 

The boot/install servers were selected following the default options offered by 
PSSP. The first node in each frame is designated as the boot/install server 
for the rest of nodes in that frame. 

The frame numbering has been selected to be consecutive because each 
frame has thin nodes in it; hence, it cannot have expansion frames. 
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Therefore, there is no need skipping frame numbers for future expansion 
frames. 
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Chapter 2. Validate hardware and software configuration 


This chapter discusses the hardware components of the RS/6000 SP, such as 
node types, control workstation, frames, and switches. It also provides some 
additional information on disk, memory, and software requirements. 


2.1 Key concepts you should study 

The topics covered in this section provides a good preparation toward the 
RS/6000 SP certification exam. Before taking the exam, make sure you 
understand the following key concepts: 

• What hardware components comprise an SP system? 

• The types and models of nodes, frames, and switches. 

• Hardware and software requirements for the control workstation. 

• Levels of PSSP and AIX supported by nodes and control workstations 
(especially in mixed environments). 


2.2 Hardware 

The basic components of the RS/6000 SP are: 

• The frame with its integral power subsystems. 

• Processor nodes (includes SP-Attached Servers). 

• Optional dependent nodes that serve a specific function, such as 
high-speed network connections. 

• Optional SP Switch and Switch-8 to expand your system. 

• Control workstation (a high-availability option is also available). 

• Network connectivity adapters and peripheral devices, such as tape 
and disk drives. 

These components connect to your existing computer network through a local 
area network (LAN), thus, making the RS/6000 SP system accessible from 
any network-attached workstation. 

Figure 2 on page 8 shows a sample of RS/6000 SP components. 
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Figure 2. Sample RS/6000 SP with external node 


2.3 Frames 

The building block of RS/6000 SP is the frame. There are two sizes: The tall 
frame (1.93 meters high) and the short frame (1.25 meters high). RS/6000 SP 
internal nodes are mounted in either a tall or short frame. A tall frame has 
eight drawers, while a short frame has four drawers. Each drawer is further 
divided into two slots. A thin node occupies one slot; a Wide node occupies 
one drawer (two slots), and a High node occupies two drawers (four slots). An 
internal power supply is included with each frame. Frames get equipped with 
optional processor nodes and switches. 

There are five current types of frames: 

• The tall model frame 

• The short model frame 

• The tall expansion frame 

• The short expansion frame 

• The SP Switch frame 
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The model frame is always the first frame in an SP system. It designates the 
type or model class of your SP system. The optional model types are either a 
tall frame system or a short frame system. Other frames that you connect to 
the model frame are known as expansion frames. The SP Switch frame is 
used to host switches or Intermediate Switch Boards (ISB), which are 
described later in this chapter. This special type of frame can host up to eight 
switch boards. 

Since the original RS/6000 SP product was made available in 1993, there 
have been a number of model and frame configurations. The frame and the 
first node in the frame were tied together forming a model. Each configuration 
was based on the frame type and the kind of node installed in the first slot. 
This led to an increasing number of possible prepackaged configurations as 
more nodes became available. 

The introduction of a new tall frame in 1998 is the first attempt to simplify the 
way frames and the nodes inside are configured. This new frame replaces the 
old frames. The most noticeable difference between the new and old frame is 
the power supply size. Also, the new tall frame is shorter and deeper than the 
old tall frame. With the new offering, IBM simplified the SP frame options by 
telecopying the imbedded node from the frame offering. Therefore, when you 
order a frame, all you receive is a frame with the power supply unit(s) and a 
power cord. All nodes, switches, and other auxiliary equipment are ordered 
separately. 

All new designs are completely compatible with all valid SP configurations 
using older equipment. Also, all new nodes can be installed in any existing SP 
frame provided that the required power supply upgrades have been 
implemented in that frame. 

-Note- 

Tall frames and short frames cannot be mixed in an SP system. 


2.3.1 Tall frames 

The tall model frame (model 550) and the tall expansion frame (feature code 
#1550) each have eight drawers, which hold internal nodes and an optional 
switch board. Depending on the type of node selected, an SP tali frame can 
contain up to a maximum of 16 thin nodes, eight Wide nodes, or four High 
nodes. Node types may be mixed in a system and scaled up to 128 nodes 
(512 by special request). 
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2.3.2 Short frames 


The short model frame (model 500) and the short expansion frame (feature 
code #1500) each have four drawers, which hold internal nodes, and an 
optional switch board. Depending on the type of node selected, an SP short 
frame can contain up to a maximum of eight thin nodes, four Wide nodes, or 
two High nodes. Also, node types can be mixed and scaled up to only eight 
nodes. Therefore, for a large configuration or high scalability, tall frames are 
recommended. 

Only the short model frame can be equipped with a switch board. The short 
expansion frame cannot hold a switch board, but nodes in the expansion 
frame can share unused switch ports in the model frame. 

Figure 3 illustrates short frame components from the front view. 



2.3.3 SP Switch frames 

The SP Switch frame is defined as a base offering tall frame equipped with 
either four or eight Intermediate Switch Boards (ISB). This frame does not 
contain processor nodes. It is used to connect model frames and switched 
expansion frames that have maximized the capacity of their integral switch 
boards. Switch frames can only be connected to data within the local SP 
system. 


10 IBM Certification Study Guide RS/6000 SP 




































The base level SP Switch frame (feature code #2031) contains four ISBs. An 
SP Switch frame with four ISBs will support up to 128 nodes. The base level 
SP Switch frame can also be configured into systems with fewer than 65 
nodes. In this environment, the SP Switch frame will greatly simplify future 
system growth. Figure 4 shows an SP Switch frame with eight ISBs. 

-Note 

The SP Switch frame is required when the sixth SP frame with an SP 
Switch board is added to the system and is a mandatory prerequisite for all 
large scale systems. 



Figure 4. SP Switch frame with eight Intermediated Switch Boards (ISB) 


2.3.4 Power supplies 

Tall frames come equipped with redundant (N+1) power supplies; if one 
power supply fails, another takes over. Redundant power is an option with the 
short frames (feature code #1213). These power supplies are self-regulating 
units. Power units with the N+1 feature are designed for concurrent 
maintenance; if a power unit fails, it can be removed and repaired without 
interrupting the running processes on the nodes. 

A tall frame has four power supplies. In a fully populated frame, the frame can 
operate with only three power supplies (N+1). Short frames come with one 
power supply, and a second, optional one, can be purchased for N+1 support. 
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Figure 5 on page 12 illustrates tall frame components from front and rear 
views. 

The power consumption depends on the number of nodes installed in the 
frame. For details, refer to RS/6000 SP: Planning Vol 1, Hardware and 
Physical Environment , GA22-7280. 



Figure 5. Front and rear views of tall frame components 


2.3.5 Hardware control and supervision 

Each frame (tall and short) has a supervisor card. This supervisor card 
connects to the control workstation through a serial link as shown in Figure 6 
on page 13. 

The supervisor subsystem consists of the following components: 

• Node supervisor card (one per processor node) 

• Switch supervisor card (one per switch assembly) 

• Internal cable (one per thin processor node or switch assembly) 

• Supervisor bus card (one per thin processor node or switch assembly) 

• Frame supervisor card 
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• Serial cable (RS-232) 

• Service and Manufacturing Interface (SAMI) cable 



SP Switch (SPS) 






Figure 6. Frame supervisor attachment 

There is a cable that connects from the frame supervisor card (position A) to 
the switch supervisor card (position B) on the SP Switch or the SP-Switch-8 
boards and to the node supervisor card (position C) of every node in the 
frame. Therefore, the control workstation can manage and monitor frames, 
switches, and all in-frame nodes. 


Chapter 2. Validate hardware and software configuration 13 








































































2.4 Standard nodes 

The basic RS/6000 SP building block is the server node or standard node. 
Each node is a complete server system comprising of processor(s), memory, 
internal disk drive, expansion slots, and its own copy of the AIX operating 
system. The basic technology is shared with standard RS/6000 workstations 
and servers, but differences exist that allow nodes to be centrally managed. 
There is no special version of AIX for each node. The same version runs on 
all RS/6000 systems. 

Standard nodes can be classified as those that are inside the RS/6000 SP 
frame and those that are not. 

2.4.1 Internal nodes 

Internal nodes can be classified, based on their physical size, as Thin, Wide, 
and High nodes. Thin nodes occupy one slot of an SP frame, while Wide 
nodes occupy one full drawer of an SP frame. A High node occupies two full 
drawers (four slots). 

Since 1993, when IBM announced the RS/6000 SP, there have been 14 
internal node types excluding some special on request node types. There are 
five most current nodes: 160 MHz Thin P2SC node, 332 MHz SMP Thin node, 
332 MHz SMP Wide node, POWER3 SMP Thin node, and POWER3 SMP 
Wide node. Only the 160 MHz Thin P2SC node utilizes Micro Channel 
Architecture (MCA) bus architecture while the others use PCI bus 
architecture. 

160 MHz Thin P2SC nodes 

This node is based on the POWER2 Super Chip (P2SC) implementation of 
the POWER architecture. Each node contains a 160 MHz P2SC processor 
combining IBM RISC microprocessor technology and the IBM implementation 
of the UNIX operating system, AIX. The standard memory in each node is 64 
MB expandable to 1 GB maximum. The minimum internal disk storage in 
each node is 4.5 GB expandable to 18.2 GB. Each node has two disk bays, 
four Micro Channel slots, and integrated SCSI-2 Fast/Wide and Ethernet (10 
Mbps) adapters. This node is equivalent to the RS/6000 stand-alone model 
7012-397. 

332 MHz SMP Thin nodes 

This node is the first PCI architecture bus node of the RS/6000 SP. Each 
node has two or four PowerPC 604e processors running at 332 MHz clock 
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cycle, two memory slots with 256 MB, expandable to 3 GB of memory, 
integrated Ethernet (10 Mbps) and SCSI-2 Fast/Wide I/O adapters to 
maximize the number of slots available for application use. This Thin node 
has two internal disk bays with a maximum of 18.2 GB (mirror) and two PCI 
I/O expansion slots (32-bit). The 332 MHz SMP Thin node can be upgraded 
to the 332 MHz SMP Wide node. 

332 MHz SMP Wide nodes 

The 332 MHz SMP Wide node is a 332 MHz SMP Thin node combined with 
additional disk bays and PCI expansion slots. This Wide node has four 
internal disk bays with a maximum of 36.4 GB (mirror) and ten PCI I/O 
expansion slots (three 64-bit, seven 32-bit). Both 332 MHz SMP Thin and 
Wide nodes are based on the same technology as the RS/6000 model H50 
and have been known as the Silver nodes. Figure 7 shows a 332 MHz SMP 
node component diagram. 



POWER3 SMP Thin nodes 

This node is the first 64-bit internal processor node of the RS/6000 SP. Each 
node has a one- or two-way (within two processor cards) configuration 
utilizing a 64-bit 200 MHz POWER3 processor with a 4 MB Level 2 (L2) cache 
per processor. The standard ECC SDRAM memory in each node is 256 MB 
expandable up to 4 GB (within two card slots). This new node is shipped with 
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disk pairs as a standard feature to encourage the use of mirroring to 
significantly improve system availability. This Thin node has two internal disk 
bays for pairs of 4.5 GB, 9.1 GB, and 18.2 GB Ultra SCSI disk capacity. There 
is a new optional pair of 9.1 GB and 18.2 GB Ultra SCSI 10K RPM disk 
capacity available. Each node has two 32-bit PCI slots and integrated 10/100 
Ethernet and Ultra SCSI adapters. The POWER3 SMP Thin node can be 
upgraded to the POWER3 SMP Wide node. 

POWER3 SMP Wide nodes 

The POWER3 SMP Wide node is a POWER3 SMP Thin node combined with 
additional disk bays and PCI expansion slots. This Wide node has four 
internal disk bays for pairs of 4.5 GB, 9.1 GB, and 18.2 GB Ultra SCSI disk 
capacity. There is a new optional pair of 9.1 GB, 18.2 GB, and 36.4 GB Ultra 
SCSI 10K RPM disk capacity available. The new pair of 36.4 GB drive are 
available only for I/O side DASD bays. Each node has ten PCI slots (two 
32-bit, eight 64-bit). Both POWER3 SMP Thin and Wide nodes are equivalent 
to the RS/6000 43P model 260. A diagram of the POWER3 SMP node is 
shown in Figure 8. Notice that it uses docking connectors (position A) instead 
of flex cables as in the 332 MHz node. 



IBM Certification Study Guide RS/6000 SP 






















The minimum software requirements for P0WER3 SMP Thin and Wide nodes 
are the AIX Version 4.3.2 and PSSP Version 3.1. 

Table 1 shows a comparison of current nodes. 


Table 1. Current nodes comparison 


Node type 

160 MHz 

Thin 

332 MHz 
SMP Thin 

332 MHz 
SMP Wide 

POWER3 
SMP Thin 

POWER3 
SMP Wide 

Processor 

160 MHz 
P2SC 

332 MHz 2- or 4- way 
PowerPC 604e 

200 MHz 1- or 2- way 
POWER3 

LI Cache 
(Instr./Data) 
per processor 

32 KB/ 

128 KB 

32 KB / 32 KB 

32 KB / 64 KB 

L2 Cache (per 
processor) 

- 

256 KB 

4 MB 

Std. Memory 

64 MB 

256 MB 

256 MB 

Max. Memory 

1 GB 

3 GB 

4 GB 

Memory Slots 

4 

2 

2 

Disk Bays 

2 

2 

4 

2 

4 

Min. Int. Disk 

4.5 GB 

None Required 

None Required 

Max. Int. Disk 

18.2 GB 

36.4 GB or 
18.2 GB 
(Mirror) 

72.8 GB or 
36.4 GB 
(Mirror) 

36.4 GB or 
18.2 GB 
(Mirror) 

72.8 GB or 
54.6 GB 
(Mirror) 

Expansion 

Slots 

4 MCA 

2 PCI 
(32-bit) 

10 PCI 
(3 64-bit, 

7 32-bit) 

2 PCI 
(32-bit) 

10 PCI 
(8 64-bit, 

2 32-bit) 

Adapters 

Integrated 
SCSI-2 
F/W and 
Ethernet 
(10 Mbps) 

Integrated SCSI-2 F/W 
and Ethernet (10 Mbps) 

Integrated Ultra SCSI and 
Ethernet (10/100 Mbps) 


2.4.1.1 332 MHz SMP node system architecture 

The 332 MHz SMP Thin and Wide nodes provide two- or four-way symmetric 
multiprocessing utilizing PowerPC technology and extend the RS/6000 PCI 
I/O technology to the SP system. With their outstanding integer performance, 
these nodes are ideal for users who need mission-critical commercial 
computing solutions. The 332 MHz SMP node system structure is shown in 
Figure 9 on page 18. 
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Figure 9. 332 MHz SMP node system architecture 

Processor and Level 2 cache controller 

The 332 MHz SMP node contains two- or four-way 332 MHz PowerPC 604e 
processors, each with its own 256 KB Level 2 cache. The X5 Level 2 cache 
controller incorporates several technological advancements in design 
providing greater performance over traditional cache designs. The cache 
controller implements an eight-way, dual-directory, set-associative cache 
using SDRAM. When instructions or data are stored in a cache, they are 
grouped into sets of eight 64-byte lines. The X5 maintains an index to each of 
the eight sets. It also keeps track of the tags used internally to identify each 
cache line. Dual tag directories allow simultaneous processor requests and 
system bus snoops, thus, reducing resource contention and speeding up 
access. 

System bus 


IBM Certification Study Guide RS/6000 SP 









































































































































The SMP system bus is optimized for high performance and multiprocessing 
applications. It has a separate 64-bit address bus and 128-bit data bus. 
These buses operate independently in the true split transaction mode and are 
aggressively pipelined. For example, new requests may be issued before 
previous requests are completed. There is no sequential ordering 
requirement. Each operation is tagged with an 8-bit tag, which allows a 
maximum of up to 256 transactions to be in progress in the system at any one 
time. 

System memory 

The 332 MHz SMP node supports 256 MB to 3 GB of 10-nanosecond 
SDRAM. System memory is controlled by the memory-l/O chip, which is 
capable of providing a sustained memory bandwidth of over 1.3 GB per 
second. The memory controller supports up to two memory cards with up to 
eight increments of SDRAM on each card. 

I/O subsystem 

The memory-l/O controller implements a 64-bit, multiplexed address and data 
bus for attaching several PCI I/O buses and the SP Switch MX adapter. This 
bus runs concurrent with, and independent from, the system and memory 
buses. The peak bandwidth of this bus is 400 MB per second. Two 32-bit PCI 
slots are in the thin node, and three additional 64-bit PCI slots and five 32-bit 
PCI slots are in the Wide node. 

2.4.1.2 POWER3 SMP node system architecture 

The POWER3 SMP node has excellent performance for compute-intensive 
analysis applications. The heart of this node is the POWER3 microprocessor 
based on IBM PowerPC architecture and RS/6000 Platform architecture. It 
provides a high bandwidth interface to a fast Level 2 (L2) cache and a 
separate high bandwidth interface to memory and other system functions. 
The POWER3 microprocessor implements the 64-bit PowerPC architecture 
and is fully compatible with existing 32-bit applications. 

The POWER3 SMP node system structure is shown in Figure 10 on page 20. 
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Figure 10. POWER3 SMP node system architecture 

P0WER3 microprocessor 

The POWER3 is a single chip implemented with 0.25 micron CMOS 
technology. It operates at a 200 MHz clock cycle. The POWER3 design 
contains a superscalar core that is comprised of eight execution units and 
allows concurrent operation of fixed-point, load/store, branch, and 
floating-point instructions. The processor can perform up to four floating-point 
operations per clock cycle. There is a 32 KB instruction and a 64 KB data 
Level 1 cache integrated within a single chip. Both instruction and data cache 
are parity protected. There is a 256-bit external interface to the 4 MB Level 2 
cache, which operates at 200 MHz and is ECC protected. 

System bus 

The system bus, referred to as the 6XX bus, connects up to two POWER3 
processors to the memory-l/O controller chip set. It provides 40 bits of real 
address and a separate 128-bit data bus. The address, data, and tag buses 
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are fully parity protected. The 6XX bus runs at a 100 MHz clock rate, and 
peak data throughput is 1.6 GB/second. 

System memory 

The POWER3 SMP node supports 256 MB to 4 GB of 10-nanosecond 
SDRAM. System memory is controlled by the memory-l/O chip set through 
the memory bus. The memory bus consists of a 128-bit data bus and 
operates at 100 MHz clock cycle. It is separated from the system bus (6XX 
bus), which allows for concurrent operations on these two buses. For 
example, cache-to-cache transfers can occur while a Direct Memory Access 
(DMA) operation is proceeding to an I/O device. There are two memory card 
slots each supporting up to 16 128 MB memory DIMMs. Memory DIMMs must 
be used in pairs, and at least one memory card with a minimum of 256 MB 
memory is required to be operational. System memory is protected with a 
Single Error Correction, Double Error Detection ECC code. 

I/O subsystem 

The Memory-l/O controller chip set implements a 64-bit plus parity, 
multiplexed address, and data bus (6XX-MX bus) for attaching three PCI 
controller chips and the SP Switch MX2 adapter. The 6XX-MX bus runs at 60 
MHz clock cycle, and the peak bandwidth of the 6XX-MX bus is 480 
MB/second. The three PCI controller attached to the 6XX-MX bus provides 
the interface for ten PCI slots. Two 32-bit PCI slots are in the thin node, and 
eight additional 64-bit PCI slots are in the Wide node. One of the PCI 
controller chips (controller chip 0) provides support for integrated Ultra2 SCSI 
and 10Base2, 10/IOOBaseT Ethernet functions. The Ultra2 SCSI interface 
supports up to four internal disks. An ISA bridge chip is also attached to PCI 
controller chip 0 for supporting two serial ports and other internally used 
functions in the POWER3 SMP node. 

Service processor 

The service processor function is integrated on the I/O planner board. This 
service processor performs system initialization, system error recovery, and 
diagnostic functions that give the POWER3 SMP node a high level of 
availability. The service processor is designed to save the state of the system 
to 128 KB of non-volatile memory (NVRAM) to support subsequent diagnostic 
and recovery actions taken by other system firmware and the AIX operating 
system. 
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2.4.2 External nodes 


A external node is a kind of processor node that cannot be housed in the 
frame due to its large size. The current supported external nodes are 
RS/6000 model S70, RS/6000 model S70 Advanced (S7A), and RS/6000 
model S80. Both are large enterprise server class utilizing 64-bit symmetric 
multiprocessor (SMP) system that supports 32- and 64-bit applications 
concurrently. The bus architecture in these servers is PCI architecture. The 
differences between these models are the base processor (PowerPC RS64 
125 MHz - S70 Server, PowerPC RS64 II 262 MHz - S70 Advanced Server, 
and PowerPC RS64 III 450 MHz- S80), standard memory, and high 
availability I/O drawer on the S70 Advanced Server. The external node is 
known as a SP-Attached server. 

These servers excel in capacity and scalability in On-line Transaction 
Processing (OLTP), Server Consolidation, Supply Chain Management, and 
Enterprise Resource Planning (ERP), such as SAP, where single large 
database servers are required. 

2.4.2.1 SP-Attached servers 

The RS/6000 7017 Enterprise Server Model S70, Model S7A, and Model S80 
are packaged in two side-by-side units. The first unit is the Central 
Electronics Complex (CEC). The second unit is a standard 19-inch I/O tower. 
Up to three more I/O towers can be added to a system. Figure 11 on page 23 
shows the RS/6000 7017 Enterprise Server scalability. 

The Central Electronics Complex contains: 

• Either 64-bit 125 MHz PowerPC RS64 I processors (S70), 262 MHz 
PowerPC RS64 II processors (S7A), or 450 MHz PowerPC RS64 III 
(S80). 

• Optional 4-way processor cards (the same processor) that scale 
configuration to 8-way or 12-way SMP processing. The S80 has 6-way 
cards and scales up to 24-way. 

• 4 MB ECC L2 cache memory per 125 MHz processor or 8 MB per 262 
MHz and 450 MHz processor. 

• Standard 512 MB ECC SDRAM memory expands to 16 GB (S70), 32 
GB (S7A), or 64 GB (S80). 

• A high-speed, multi-path switch. 

• A memory controller and system memory. 

• Two high-speed memory ports with a total collective memory bandwidth 
of up to 5.6 GB per second. 
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• A base configuration consisting of a 4-way SMP processor card. 

Each I/O rack accommodates up to two I/O drawers (maximum four drawers 
per system) with additional space for storage and communication 
subsystems. The base I/O drawer contains: 

• A high-performance 4.5GB GB UltraSCSI disk drive 

• A 20X (Max) CD-ROM 

• A 1.44 MB 3.5-inch diskette drive 

• A service processor 

• Eleven available PCI slots 

• Two available media bays 

• Eleven available hot-swappable disk drive bays 

Each additional I/O drawer contains: 

• Fourteen available PCI slots (nine 32-bit and five 64-bit) providing an 
aggregate data throughput of 500 MB per second to the I/O hub. 

• Three available media bays. 

• Twelve available hot-swappable disk drive bays. 

When all four I/O drawers are installed, the 7017 contains twelve media bays, 
forty-eight hot-swappable disk drive bays, and fifty-six PCI slots per system. 



I/O I/O I/O 

One I/O Drawer Required in First Rack 

56 Slots (4 I/O Drawers) 
up to 38 Terabytes of Storage (Max) 


CEC I/O 

880 lbs 286 lbs (Empty) 

Processors 
Memory I/O Drawers 

Power (95 to 135 lbs) 


Figure 11. RS/6000 7017 Enterprise Server S70/S7A/S80 system scalability 
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2.4.2.2 SP-Attached server attachment 

It is important to note that the size of the S70, S7A, and S80 prohibit it from 
being physically mounted in the SP frame. Since the SP-Attached server is 
mounted in its own rack and is directly attached to the control workstation 
using two custom RS-232 cables, the SP system must view the SP-Attached 
server as a frame. Therefore, the SP system views the SP-Attached server as 
an object with both frame and node characteristics. 

The SP-Attached server requires a minimum of four connections with the SP 
system in order to establish a functional and safe network. If your SP system 
is configured with an SP Switch, there will be five required connections as 
shown in Figure 12 on page 25. 

Three connections are required with the control workstation. 

1. An Ethernet connection to the SP-LAN for system administration 
purposes 

2. A custom RS-232 cable connecting the control workstation to the SAMI 
port of SP-Attached server (front panel) 

3. A second custom RS-232 cable connecting the control workstation to 
the serial port of SP-Attached server (SI port) 

The fourth connection is a 10 m frame-to-frame electrical ground cable. 

The fifth connection is required if the SP system is switch configured. 
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Figure 12. The SP-attached server connection 


2.5 Dependent nodes 

Dependent nodes are non-standard nodes that extend the SP system’s 
capabilities but cannot be used in all of the same ways as standard SP 
processor nodes. A dependent node depends on SP nodes for certain 
functions but implements much of the switch-related protocol that standard 
nodes use on the SP Switch. Typically, dependent nodes consist of four major 
components as follows: 

1. A physical dependent node - The hardware device requiring SP 
processor node support. 

2. A dependent node adapter - A communication card mounted in the 
physical dependent node. This card provides a mechanical interface for 
the cable connecting the physical dependent node to the SP system. 

3. A logical dependent node - Made up of a valid, unused node slot and 
the corresponding unused SP Switch port. The physical dependent 
node logically occupies the empty node slot by using the corresponding 
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SP Switch port. The switch port provides a mechanical interface for the 
cable connecting the SP system to the physical dependent node. 

4. A cable - To connect the dependent node adapter with the logical 
dependent node. It connects the extension node to the SP system. 

2.5.1 SP Switch Router 

A specific type of dependent node is the IBM 9077 SP Switch Router. The 
9077 is a licensed version of the Ascend GRF (Goes Real Fast) switched IP 
router that has been enhanced for direct connection to the SP Switch. The SP 
Switch Router was known as the High Performance Gateway Node (HPGN) 
during the development of the adapter. These optional external devices can 
be used for high-speed network connections or system scaling using High 
Performance Parallel Interface (HIPPI) backbones or other communications 
subsystems, such as ATM or 10/100 Ethernet (see Figure 13 on page 27). 

An SP Switch Router may have multiple logical dependent nodes, one for 
each dependent node adapter it contains. If an SP Switch Router contains 
more than one dependent node adapter, it can route data between SP 
systems or system partitions. For an SP Switch Router, this card is called a 
Switch Router Adapter (feature code #4021). Data transmission is 
accomplished by linking the dependent node adapters in the switch router 
with the logical dependent nodes located in different SP systems or system 
partitions. 

In addition to the four major dependent node components, the SP Switch 
Router has a fifth optional category of components. These components are 
networking cards that fit into slots in the SP Switch Router. In the same way 
that the SP Switch Router Adapter connects the SP Switch Router directly to 
the SP Switch, these networking cards enable the SP Switch Router to be 
directly connected to an external network. The following networks can be 
connected to the RS/6000 SP Switch Router using available media cards: 

• Ethernet 10/100 Base-T 

• FDDI 

• ATM OC-3c (single or multimode fiber) 

• SONET OC-3c (single or multimode fiber) 

• ATM OC-12c (single or multimode fiber) 

• HIPPI 

• HSSI (High Speed Serial Interface) 
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Figure 13. SP Switch router 

Although you can equip an SP node with a variety of network adapters and 
use the node to make your network connections, the SP Switch Router with 
the Switch Router Adapter and optional network media cards offers many 
advantages when connecting the SP to external networks. 

• Each media card contains its own IP routing engine with separate 
memory containing a full route table of up to 150,000 routes. Direct 
access provides much faster lookup times compared to software driven 
lookups. 

• Media cards route IP packets independently at rates of 60,000 to 
130,000 IP packets per second. With independent routing available 
from each media card, the SP Switch Router gives your SP system 
excellent scalability characteristics. 

• The SP Switch Router has a dynamic network configuration to bypass 
failed network paths using standard IP protocols. 

• Using multiple Switch Router Adapters in the same SP Switch Router, 
you can provide high performance connections between system 
partitions in a single SP system or between multiple SP systems. 
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• A single SP system can also have more than one SP Switch Router 
attached to it, further insuring network availability. 

• Media cards are hot swappable for uninterrupted SP Switch Router 
operations. 

• Each SP Switch Router has redundant (N+1) hot swappable power 
supplies. 

Two versions of the RS/6000 SP Switch Router can be used with the SP 
Switch. The Model 04S (GRF 400) offers four media card slots, and the Model 
16S (GRF 1600) offers sixteen media card slots. Except for the additional 
traffic capacity of the Model 16S, both units offer similar performance and 
network availability as shown in Figure 14. 
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Figure 14. GRF models 400 and 1600 


2.5.2 SP Switch Router attachment 

The SP Switch Router requires a minimum of three connections with your SP 
system in order to establish a functional and safe network. These connections 
are: 

1. A network connection with the control workstation - The SP Switch Router 
must be connected to the control workstation for system administration 
purposes. This connection may be either: 

• A direct Ethernet connection between the SP Switch Router and the 
control workstation. 

• An Ethernet connection from the SP Switch Router to an external 
network, which then connects to the control workstation. 
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2. A connection between an SP Switch Router Adapter and the SP Switch - 
The SP Switch Router transfers information into and out of the processor 
nodes of your SP system. The link between the SP Switch Router and the 
SP processor nodes is implemented by: 

• An SP Switch Router adapter 

• A switch cable connecting the SP Switch Router adapter to a valid 
switch port on the SP Switch 

3. A frame-to-frame electrical ground - The SP Switch Router frame must be 
connected to the SP frame with a grounding cable. This frame-to-frame 
ground is required in addition to the SP Switch Router electrical ground. 
The purpose of the frame-to-frame ground is to maintain the SP and SP 
Switch Router systems at the same electrical potential. 

For more detailed information, refer to IBM 9077 SP Switch Router: Get 

Connected to the SP Switch, SG24-5157. 


2.6 Control workstation 

The RS/6000 SP system requires an RS/6000 workstation. The control 
workstation serves as a central point of control with the PSSP and other 
optional software for managing and maintaining the RS/6000 SP frames and 
individual processor nodes. It connects to each frame through an RS232 line 
to provide hardware control functions. The control workstation connects to 
each external node or SP-Attached server with two custom RS232 cables, but 
hardware control is minimal because SP-Attached servers do not have an SP 
frame or SP node supervisor. A system administrator can log in to the control 
workstation from any other workstation on the network to perform system 
management, monitoring, and control tasks. 

The control workstation also acts as a boot/install server for other servers or 
nodes in the SP system. In addition, the control workstation can be set up as 
an authentication server using Kerberos. It can be the Kerberos primary 
server with the master database and administration service as well as the 
ticket-granting service. As an alternative, the control workstation can be set 
up as a Kerberos secondary server with a backup database to perform 
ticket-granting service. 

An optional High Availability Control Workstation (HACWS) allows a backup 
control workstation to be connected to an SP system. The second control 
workstation provides backup when the primary workstation requires update 
service or fails. Planning and using the HACWS will be simpler if you 
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configure your backup control workstation identical to the primary control 
workstation. Some components must be identical, others can be similar. 


2.6.1 Supported control workstations 

There are two basic types of control workstations: 

• MCA-based control workstations 

• PCI-based control workstations 

Both types of control workstations must be connected to each frame through 
an RS-232 cable and the SP Ethernet BNC cable. These 15 m cables are 
supplied with each frame. Thus, the CWS must be no more than 12 m apart, 
leaving 3 m of cable for the vertical portion of the cable runs. If you need 
longer vertical runs, or if there are under-floor obstructions, you must place 
the CWS closer to the frame 

MCA-based control workstations: 

• RS/6000 7012 Models 37T, 370, 375, 380, 39H, 390, 397, G30, and G40 

• RS/6000 7013 Models 570, 58H, 580, 59H, 590, 591,595, J30, J40, and 
J50 (see note 1) 

• RS/6000 7015 Models 97B, 970, 98B, 980, 990, R30, R40, and R50 (see 
notes 1 and 2) 

• RS/6000 7030 Models 3AT, 3BT, and 3CT 

-Note- 

The MCA-based Control Workstation requires the following: 

1. A 7010 Model 150 X-Station and display. Other models and 
manufacturers that meet or exceed this model can be used. An 
ASCII terminal is required as the console. 

2. Installed in either the 7015-99X or 7015-R00 Rack. 


PCI-based Control Workstations: 

• RS/6000 7024 Models E20 and E30 (see note 1) 

• RS/6000 7025 Model F30 (see notes 1 and 2) 

• RS/6000 7025 Model F40 (see notes 3 and 4) 

• RS/6000 7025 Model F50 (see note 3) 

• RS/6000 7026 Models H10 and H50 (see notes 3 and 4) 
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RS/6000 7043 43P Models 140 and 240 (see notes 3, 5, and 6) 


-Note- 

PCI-based Control Workstations require the following: 

1. Supported by PSSP 2.2 and later 

2. On systems introduced since PSSP 2.4, either the 8-port (feature 
code #2493) or 128-port (feature code #2944) PCI bus 
asynchronous adapter should be used for frame controller 
connections. IBM strongly suggests you use the support processor 
option (feature code #1001). If you use this option, the frames must 
be connected to a serial port on an asynchronous adapter and not to 
the serial port on the control workstation planar board. 

3. The native RS232 ports on the system planar can not be used as tty 
ports for the hardware controller interface. The 8-port asynchronous 
adapter EIA-232/ RS-422, PCI bus (feature code #2943), or the 
128-port Asynchronous Controller (feature code #2944) are the only 
RS232 adapters that are supported. These adapters require AIX 
4.2.1 or AIX 4.3 on the control workstation. 

4. IBM strongly suggests you use the support processor option 
(# 1001 ). 

5. The 7043 can only be used on SP systems with up to four frames. 
This limitation applies to the number of frames and not the number of 
nodes. This number includes expansion frames. 

6. The 7043-43P is NOT supported as a control workstation whenever 
an S70/S7A/S80 is attached to the SP. The limitation is due to the 
load that the extra daemons place on the control workstation. 


2.6.2 Control workstation minimum hardware requirements 

The minimum hardware requirements for the control workstation are: 

• At least 128 MB of main memory. For SP systems with more than 80 
nodes, 256 MB is required, and 512 MB of memory is recommended. 

• 4 GB of disk storage plus 1 GB for each AIX release and modification 
level in your SP system. Additional disk space should be added for 
mksysb images for the nodes. Double the number of physical disks if 
you plan on using rootvg mirroring. 

• Physically installed with the RS232 cable to each SP frame. 
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• Physically installed with two custom RS232 cables to each external 
node SP-attached server, such as an RS/6000 Enterprise Server Model 
S70, S70 Advanced, or S80. 

• With the following I/O devices and adapters: 

• A 3.5 inch diskette drive. 

• Four or eight millimeter (or equivalent) tape drive. 

• SCSI CD-ROM drive. 

• One RS232 port for each SP frame. 

• Keyboard and mouse. 

• Color graphics adapter and color monitor. An X-station model 150 
and display are required if an RS/6000 that does not support a color 
graphics adapter is used. 

• SP Ethernet adapters for connection to the SP Ethernet (see 3.3.1, 
“SP Ethernet” on page 87 for details). 

2.6.3 High Availability Control Workstation 

The design of the SP High Availability Control Workstation (HACWS) is 
modeled on the High Availability Cluster Multi-Processing for RS/6000 
(HACMP) licensed program product. HACWS utilizes HACMP running on two 
RS/6000 control workstations in a two-node rotating configuration. HACWS 
utilizes an external disk that is accessed non-concurrently between the two 
control workstations for storage of SP related data. There is also a Y-cable 
connected from SP frame supervisor card to each control workstation. This 
HACWS configuration provides automated detection, notification, and 
recovery of control workstation failures. Figure 15 shows the logical view of 
the HACWS attachment. 
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The primary and backup control workstations are also connected on a private 
point-to-point network and a serial TTY link or target mode SCSI. The backup 
control workstation assumes the IP address, IP aliases, and hardware 
address of the primary control workstation. This lets client applications run 
without changes. The client application, however, must initiate reconnects 
when a network connection fails. 

The HACWS has the following limitations and restrictions: 


Chapter 2. Validate hardware and software configuration 33 


































• You cannot split the load across a primary and backup control workstation. 
Either the primary or the backup provides all the functions at one time. 

• The primary and backup control workstations must each be a RS/6000. 
You cannot use a node in your SP as a backup control workstation. 

• The backup control workstation cannot be used as the control workstation 
for another SP system. 

• The backup control workstation cannot be a shared backup of two primary 
control workstations. 

• There is a one-to-one relationship of primary to backup control 
workstations; a single primary and backup control workstation combination 
can be used to control only one SP system. 

• If a primary control workstation is an SP authentication server, the backup 
control workstation must be a secondary authentication server. 

• The S70, S70 Advanced, and S/80 SP-Attached servers are directly 
attached to the control workstation through two RS232 serial connections. 
There is no dual RS232 hardware support for these connections like there 
is for SP frames. These servers can only be attached to one control 
workstation at a time. Therefore, when a control workstation fails, or 
scheduled downtime occurs, and the backup control workstation becomes 
active, you will lose hardware monitoring and control and serial terminal 
support for your SP-Attached servers. The SP-Attached servers will have 
the SP Ethernet connection from the backup control workstation; so, 
PSSP components requiring this connection will still work correctly. This 
includes components, such as the availability subsystems, user 
management, logging, authentication, the SDR, file collections, 
accounting, and others. 

• For more detailed information, refer to RS/6000: Planning Volume 2, 
GA22-7281. 


2.7 Boot/install server requirements 

By default, the control workstation is the boot/install server. It is responsible 
for AIX and PSSP software installations to the nodes. You can also define 
other nodes to be a boot/install server. If you have multiple frames, the first 
node in each frame is selected by default as the boot/install server for all the 
nodes in its frame. 

When you select a node to be a boot/install server, the setup_server script 
will copy all the necessary files to this node, and it will configure this node to 
be a NIM master. All mksysbs and PSSP levels served by this boot/install 
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server node will be copied from the control workstation the first time 
setup_server is run against this node. The only NIM resource that is not 
maintained locally in this node is the Ippsource. The Ippsource always resides 
on the control workstation; so, when the Ippsource NIM resource is created 
on boot/install servers, it only contains a pointer to the control workstation. 
The Sequence Power Off Timer (SPOT) is created off the Ippsource contents, 
but it is maintained locally on every boot/install server. 

Generally, you can have a boot/install server for every eight nodes. Also, you 
may want to consider having a boot/install server for each version of AIX and 
PSSP (although this is not required). 

The following requirements exist for all configurations: 

• Each boot/install server’s enO Ethernet adapter must be directly 
connected to each of the control workstation’s ethernet adapter(s). 

• The Ethernet adapter configured as enO must always be in the SP node’s 
lowest hardware slot of all Ethernets. 

• The NIM clients that are served by boot/install servers must be on the 
same subnet as the boot/install server’s Ethernet adapter. 

• NIM clients must have a route to the control workstation over the SP 
Ethernet. 

• The control workstation must have a route to the NIM clients over the SP 
Ethernet. 

Figure 16 shows an example of a single frame with a boot/install server 
configured on node 1. 
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Figure 16. Boot/Install servers 


2.8 SP Switch communication network 

During the initial development of the SP system, a high-speed 
interconnection network was required to enable communication between the 
nodes that made up the SP complex. The initial requirement was to support 
the demands of parallel applications that utilize the distributed memory MIMD 
programming model. More recently, the SP Switch network has been 
extended to a variety of purposes: 

• Primary network access for users external to the SP complex (when used 
with SP Switch Router) 

• Used by ADSM for node backup and recovery 

• Used for high-speed internal communications between various 
components of third-party application software (for example, SAP's R/3 
suite of applications) 

All of these applications are able to take advantage of the sustained and 
scalable performance provided by the SP Switch. The SP Switch provides the 
message passing network that connects all of the processors together in a 
way that allows them to send and receive messages simultaneously. 
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There are two networking topologies that can be used to connect parallel 
machines: Direct and indirect. 

In direct networks, each switching element connects directly to a processor 
node. Each communication hop carries information from the switch of one 
processor node to another. 

Indirect networks, on the other hand, are constructed such that some 
intermediate switch elements connect only to other switch elements. 
Messages sent between processor nodes traverse one or more of these 
intermediate switch elements to reach their destination. The advantages of 
the SP Switch network are: 

• Bisectional bandwidth scales linearly with the number of processor nodes 
in the system. 

Bisectional bandwidth is the most common measure of total bandwidth for 
parallel machines. Consider all possible planes that divide a network into 
two sets with an equal number of nodes in each. Consider the peak 
bandwidth available for message traffic across each of these planes. The 
bisectional bandwidth of the network is defined as the minimum of these 
bandwidths. 

• The network can support an arbitrarily large interconnection network while 
maintaining a fixed number of ports per switch. 

• There are typically at least four shortest-path routes between any two 
processor nodes. Therefore, deadlock will not occur as long as the packet 
travels along any shortest-path route. 

• The network allows packets that are associated with different messages to 
be spread across multiple paths, thus, reducing the occurrence of hot 
spots. 

The hardware component that supports this communication network consists 
of two basic components: The SP Switch Adapter and the SP Switch Board. 
There is one SP Switch Adapter per processor node and, generally, one SP 
Switch Board per frame. This setup provides connections to other processor 
nodes. Also, the SP system allows switch boards-only frames that provide 
switch-to-switch connections and greatly increase scalability. 

2.8.1 SP Switch hardware components 

This section discusses the hardware components that make up the SP Switch 
network: The Switch Link, the Switch Port, the Switch Chip, the Switch 
Adapter, and the Switch Board. The Switch Link itself is the physical cable 
connecting two Switch Ports. The Switch Ports are hardware subcomponents 
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that can reside on a Switch Adapter that is installed in a node or on a Switch 
Chip that is part of a Switch Board. 

2.8.1.1 SP Switch Board 

An SP Switch Board contains eight SP Switch Chips that provide connection 
points for each of the nodes to the SP Switch network as well as for each of 
the SP Switch Boards to the other SP Switch Boards. The SP Switch Chips 
each have a total of eight Switch Ports that are used for data transmission. 
The Switch Ports are connected to other Switch Ports through a physical 
Switch Link. 

In summary, there are 32 external SP Switch Ports in total. Of these, 16 are 
available for connection to nodes, and the other 16 to other SP Switch 
Boards. The SP Switch Board is mounted in the base of the SP Frame above 
the power supplies. 

A schematic diagram of the SP Switch Board is shown on Figure 17. 


to SP 
Nodes 



connections to 
other SP Switch 
Boards 


Figure 17. SP Switch board 

2.8.1.2 SP Switch Link 

An SP Switch Link connects two switch network devices. It contains two 
channels carrying packets in opposite directions. Each channel includes: 

• Data (8 bits) 

• Data Valid (1 bit) 

• Token signal (1 bit) 
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The first two elements here are driven by the transmitting element of the link, 
while the last element is driven by the receiving element of the link. 


2.8.1.3 SP Switch Port 

An SP Switch Port is part of a network device (either the SP Adapter or SP 
Switch Chip) and is connected to other SP Switch Ports through the SP 
Switch Link. The SP Switch Port includes two ports (input and output) for full 
duplex communication. 

The relationship between the SP Switch Chip Link and the SP Switch Chip 
Port is shown in Figure 18. 



Figure 18. Relationship between switch chip link and switch chip port 

2.8.1.4 SP Switch Chip 

An SP Switch Chip contains eight SP Switch Ports, a central queue, and an 
unbuffered crossbar that allows packets to pass directly from receiving ports 
to transmitting ports. These crossbar paths allow packets to pass through the 
SP Switch (directly from the receivers to the transmitters) with low latency 
whenever there is no contention for the output port. As soon as a receiver 
decodes the routing information carried by an incoming packet, it asserts a 
crossbar request to the appropriate transmitter. If the crossbar request is not 
granted, the crossbar request is dropped (and, hence, the packet will go to 
the central queue). Each transmitter arbitrates crossbar requests on a least 
recently served basis. A transmitter will honor no crossbar request if it is 
already transmitting a packet or if it has packet chunks stored in the central 
queue. Minimum latency is achieved for packets that use the crossbar. 
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A schematic diagram of the SP Switch Chip is shown in Figure 19. 


SP Switch Chip 



2.8.1.5 SP Switch Adapter 

Another network device that uses an SP Switch Port is the SP Switch 
Adapter. An SP Switch Adapter includes one SP Switch Port that is 
connected to an SP Switch Board. The SP Switch Adapter is installed in an 
SP node. 

Nodes based on RS/6000s that use the MCA bus obviously use the 
MCA-based switch adapter (#4020). The same adapter is used in 
uniprocessor thin, wide, and SMP High nodes. 

New nodes based on PCI bus architecture (332 MHz SMP Thin and Wide 
nodes, the 200 MHz POWER3 SMP Thin and Wide nodes) must use the 
newer MX-based switch adapters (#4022 and #4023, respectively) since the 
switch adapters are installed on the MX bus in the node. The so-called 
mezzanine, or MX bus, allows the SP Switch Adapter to be connected directly 
onto the processor bus providing faster performance than adapters installed 
on the I/O bus. The newer (POWER3) nodes use an improved adapter based 
on a faster mezzanine (MX2) bus. 
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External nodes, such as the 7017-S70, 7017-S7A, and 7017-S80, are based 
on standard PCI bus architecture. If these nodes are to be included as part of 
an SP Switch network, then the switch adapter installed in these nodes is a 
PCI-based adapter (#8396). 

Figure 20 shows a schematic diagram of the SP Switch Adapter. 


SP Node 



SP Switch Link 


Figure 20. SP Switch adapter 

2.8.1.6 SP Switch system 

The SP Switch system in a single frame of an SP is illustrated in Figure 21 on 
page 42. In one SP frame, there are 16 nodes (maximum) equipped with SP 
Switch Adapters and one SP Switch Board. Sixteen node SP Switch Adapters 
are connected to 16 of 32 SP Switch Ports in the SP Switch Board. The 
remaining 16 SP Switch Ports are available for connection to other SP Switch 
Boards. 
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Other SP Switch Boards 


Figure 21. SP Switch system 

2 . 8.2 SP Switch networking fundamentals 

When considering the network topology of the SP Switch network, nodes are 
logically ordered into groups of 16 that are connected to one side of the SP 
Switch Boards. A 16-node SP system containing one SP Switch Board is 
schematically presented in Figure 22 on page 43. This SP Switch Board that 
connects nodes is called a Node Switch Board (NSB). Figure 22 also 
illustrates the possible shortest-path routes for packets sent from node A to 
two destinations. Node A can communicate with node B by traversing a single 
SP Switch chip and with node C by traversing three SP Switch chips. 
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Model Frame 



Figure 22. 16-Node SP system 


The 16 unused SP Switch ports on the right side of the Node Switch Board 
are used for creating larger networks. There are two ways to do this: 

• For an SP system containing up to 80 nodes, these SP Switch ports 
connect directly to the SP Switch ports on the right side of other node 
switch boards. 

• For an SP system containing more than 80 nodes, these SP Switch ports 
connect to additional stages of switch boards. These additional SP Switch 
Boards are known as Intermediate Switch Boards (ISBs). 

The strategy for building an SP system of up to 80 nodes is shown in Figure 
23 on page 44. The direct connection (made up of 16 links) between two 
NSBs forms a 32-node SP system. Example routes from node A to node B, C, 
and D are shown. Just as for a 16-node SP system, packets traverse one or 
three SP Switch Chips when the source and destination pair are attached to 
the same Node Switch Board. When the source and destination pair are 
attached to different Node Switch Boards, the shortest path routes contain 
four SP Switch Chips. For any pair of nodes connected to separate SP Switch 
Boards in a 32-node SP system, there are four potential paths providing a 
high level of redundancy. 
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If we now consider an SP system made up of three frames of thin nodes (48 
nodes in total, see Figure 24), we observe that the number of direct 
connections between frames has now decreased to eight. (Note that for the 
sake of clarity, not all the individual connections between Switch ports of the 
NSBs have been shown; instead, a single point-to-point line in the diagram 
has been used to represent the eight real connections between frames. This 
simplifying representation will be used in the next few diagrams.) Even so, 
there are still four potential paths between any pair of nodes that are 
connected to separate NSBs. 



Figure 24. SP 48-Way system interconnection 
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Adding another frame to this existing SP complex further reduces the number 
of direct connections between frames. The 4-frame, 64-way schematic 
diagram is shown in Figure 25. Here, there are at least five connections 
between each frame, and note that there are six connections between 
Frames 1 and 2 and between Frames 3 and 4. Again, there are still four 
potential paths between any pair of nodes that are connected to separate 
NSBs. 



Figure 25. 64-Way system interconnection 

If we extend this 4-frame SP complex by adding another frame, the 
connections between frames are reduced again (see Figure 26 on page 46); 
at this level of frame expansion, there are only four connections between 
each pair of frames. However, there are still four potential paths between any 
pair of nodes that are connected to separate NSBs. 
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The addition of a sixth frame to this configuration would reduce the number of 
direct connections between each pair of frames to below four. In this 
hypothetical case, each frame would have three connections to four other 
frames and four connections to the fifth frame for a total of 16 connections per 
frame. This configuration, however, would result in increased latency and 
reduced switch network bandwidth. Therefore, when more than 80 nodes are 
required for a configuration, an (ISB) frame is used to provide 16 paths 
between any pair of frames. 

The correct representation of an SP complex made up of six frames with 96 
thin nodes is shown in Figure 27 on page 47. Here, we see that all interframe 
cabling is between each frame’s NSB and the switches within the ISB. This 
cabling arrangement provides for 16 paths between any pair of frames, thus 
increasing network redundancy and allowing the network bandwidth to scale 
linearly. 
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2.8.3 SP Switch network products 

Since the original RS/6000 SP product was made available in 1993, there 
have been two evolutionary cycles in switch technology. The original switch, 
known as the High Performance Switch (HiPS, feature code #4010), was last 
supported in Parallel System Support Programs (PSSP) Version 2.4. The 
latest version of PSSP software (Version 3.1) does not provide support for the 
HiPS switch. Switch adapters and switches (both 16-port and 8-port) based 
on the old HiPS technology are no longer available. 

2.8.3.1 SP Switch 

The operation of the SP Switch (feature code #4011) has been described in 
the preceding discussion. When configured in an SP order, internal cables 
are provided to support expansion to 16 nodes within a single frame. In 
multi-switch configurations, switch-to-switch cables are provided to enable the 
physical connectivity between separate SP Switch Boards. The required SP 
Switch Adapter connects each SP node to the SP Switch Board. 

2.8.3.2 SP Switch-8 

To meet some customer requirements, eight port switches provide a low cost 
alternative to the full size 16-port switches. The 8-port SP Switch-8 (SPS-8, 
feature code #4008) provides switch functions for an 8-node SP system. The 
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SP Switch-8 is compatible with High nodes. The SP Switch-8 is the only 
low-cost switch available for new systems. 

The SP Switch-8 has two active switch chip entry points. Therefore, the ability 
to configure system partitions is restricted with this switch. With the maximum 
eight nodes attached to the switch, there are two possible system 
configurations: 

• A single partition containing all eight nodes 

• Two system partitions containing four nodes each 

If a switch is configured in an SP system, an appropriate switch adapter is 
required to connect each RS/6000 SP node to the switch subsystem. Table 2 
summarizes the switch adapter requirements for particular node types. We 
have also included here the switch adapter that would be installed in the SP 
Switch router. An overview of this dependent node, along with installation and 
configuration information, can be found in IBM 9077 SP Switch Router: Get 
Connected to the SP Switch, SG24-5157. 


Table 2. Supported switch adapters 


SP Node Type 

Supported SP Switch Adapter 

160 Mhz Thin, 135 Mhz Wide, or 

200 Mhz High 

#4020 SP Switch Adapter 

332 Mhz SMP Thin or Wide Node 

#4022 SP Switch MX Adapter 

200 Mhz POWER3 SMP Thin or Wide 

#4023 SP Switch MX2 Adapter 

External, S70, S7A, or S80 

#8396 SP System Attachment Adapter 

External, SP Switch Router 

#4021 SP Switch Router Adapter 


The 332 Mhz and 200 Mhz SMP PCI-based nodes listed here have a unique 
internal bus architecture that allows the SP Switch Adapters installed in these 
nodes to have increased performance compared with previous node types. A 
conceptual diagram illustrating this internal bus architecture is shown in 
Figure 28 on page 49. 
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Figure 28. Internal Bus Architecture for PCI-based SMP nodes 

These nodes implement the PowerPC MP System Bus (6xx bus). In addition, 
the memory-l/O controller chip set includes an independent separately 
clocked mezzanine bus (6xx-MX) to which 3 PCI bridge chips and the SP 
Switch MX or MX2 Adapter are attached. The major difference between these 
node types is the clocking rates for the internal buses. The SP Switch 
Adapters in the these nodes plug directly into the MX bus - they do not use a 
PCI slot. The PCI slots in these nodes are clocked at 33 Mhz. In contrast, the 
MX bus is clocked at 50 Mhz in the 332 Mhz SMP nodes and at 60 Mhz in the 
200 Mhz POWER3 SMP nodes. Thus, substantial improvements in the 
performance of applications using the switch can be achieved. 


2.9 Peripheral devices 

The attachment of peripheral devices, such as disk subsytems, tape drives, 
CD-ROMs, and printers, is very straightforward on the SP. There are no 
SP-specific peripherals; since the SP uses mainstream RS/6000 node 
technology, it simply inherits the array of peripheral devices available to the 
RS/6000 family. The SP’s shared-nothing architecture gives rise to two key 
concept when attaching peripherals: 
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1. Each node has I/O slots. Think of each node as a stand-alone machine 
when attaching peripherals. It can attach virtually any peripheral available 
to the RS/6000 family, such as SCSI and SSA disk subsystems, Magstar 
tape drives, and so on. The peripheral device attachment is very flexible, 
as each node can have its own mix of peripherals or none at all. 

2. From an overall system viewpoint, as nodes are added, I/O slots are 
added, thus, the scalability of I/O device attachment is tremendous. A 
512-node High node system would have several thousand I/O slots. 

When you attach a disk subsystem to one node, it is not automatically visible 
to all the other nodes. The SP provides a number of techniques and products 
to allow access to a disk subsytem from other nodes. 

There are some general considerations for peripheral device attachment: 

• Devices, such as CD-ROMs and bootable tape drives, may be attached 
directly to SP nodes. Nodes must be network-installed by the CWS or a 
boot/install server. 

• Many node types do not have serial ports. High nodes have two serial 
ports for general use. 

• Graphics adapters for attachment of displays are not supported. 


2.10 Network connectivity adapters 

The required SP Ethernet LAN that connects all nodes to the control 
workstation is needed for system administration and should be used 
exclusively for that purpose. Further network connectivity is supplied by 
various adapters, some optional, that can provide connection to I/O devices, 
networks of workstations, and mainframe network. Ethernet, FDDI, 
Token-Ring, HIPPI, SCSI, FCS, and ATM are examples of adapters that can 
be used as part of an SP system. 


2.11 Space requirements 

Your must sum the estimated sizes of all the products you plan to run. These 
include: 

• An image comprised of the minimum AIX filesets. 

• Images comprised of required PSSP components. 

• Images of PSSP optional components and graphical user interface. In this 
case, the Resource Center, PTPE, IBM Virtual Shared Disk. 
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You can find more information on this space requirements in 7.6.2, “AIX 
Automounter” on page 241. 


2.12 Software requirements 

The SP system software infrastructure includes: 

• AIX, the base operating system 

• Parallel System Support Program (PSSP) 

• Other IBM system and application software products 

• Independent software vendor products 

The version of PSSP that will run on each type of node is shown in Table 3. 
The application the customer is using may require specific versions of AIX. 
Not all the versions of AIX run on all the nodes; so, this too must be 
considered when nodes are being chosen. 


Table 3. Minimum level of PSSP and AIX that is allowed on each node 



Uni. Thin 

332 SMP 
Thin & 
Wide 

Uni. 

Wide 

SMP 

High 

SP 

Attached 

Server 

POWER3 

SMP 

Thin & 
Wide 

Min. 

PSSP 

Level 

2.2 

2.4 

2.2 

2.2 

3.1 

3.1 

plus 

1X85457 

Minimum 
AIX Level 

4.1.5 

4.2.1 

4.1.5 

4.1.5 

4.3.2 

4.3.2 


PSSP provides the functions required to manage an SP system as a 
full-function parallel system. PSSP provides a single point of control for 
administrative task and helps increase productivity by letting administrators 
view, monitor, and control system operations. 

The PSSP Product ordered for the SP System (9076) is entitled for use 
across the entire SP Complex. PSSP V3.1 has been enhanced to allow 
attachment of an S70, S70 advanced server, and an S80. Here, the feature is 
ordered times the number of nodes. 

Software requirements for PSSP Version 2.2 on AIX 4.1: 

• AIX Version 4.1.5 (5765-C34 or 5765-393) 

• C for AIX, Version 3.1 (5765-423) or later 
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• C Set ++ for AIX, Version 3.1 (5765-421) or later 

• Performance Toolbox for AIX, Agent Component, Version 2.2 (5765-654) 

Software requirements for PSSP Version 2.2 on AIX 4.2.1: 

• AIX Version 4.2 (5765-C34 or 5765-655) or later 

• C for AIX, Version 3.1 (5765-421) or later 

• C Set ++, Version 3.1 (5765-421) or later 

• Performance Toolbox for AIX, Agent Component, Version 2.2 (5765-654) 

Software requirements for PSSP Version 2.3 on AIX 4.2: 

• AIX Version 4.2 (5765-C34 or 5765-655) or later 

• C for AIX, Version 3.1 (5765-421) or later 

• C Set ++ for AIX, Version 3.1 (5765-421) or later 

• Performance Toolbox for AIX, Agent Component, Version 2.2 (5765-654) 

Software requirements for PSSP Version 2.3 on AIX 4.3.2: 

• AIX version 4.3.2 or later (5765-C34) 

• C for AIX 4.3 or later or 

• C Set and C Set ++ for AIX Version 3.6 or later 

• Performance Toolbox for AIX, Manager Component, Version 2.2 
(5765-654) 

Software requirements for PSSP Version 2.4 on AIX 4.2.1: 

• AIX Version 4.2.1, or later (5765-C34) 

• C for AIX, Version 3.1.4.7 (5765-423) or later 

• C Set ++ for AIX, Version 3.1 (5765-421) or later 

• Performance Toolbox for AIX, Agent Component, Version 2.2 (5765-654) 

Software requirements for PSSP Version 2.4 on AIX 4.3.1: 

• AIX Version 4.3.1 or later (5765-C34) 

• C for AIX, Version 4.3 or later 

• C Set and C Set ++ for AIX, Version 3.6 or later 

• Performance Toolbox for AIX, Agent component, Version 2.2 (5765-654) 

Software requirements for PSSP Version 2.4 on AIX 4.3.2: 
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• AIX Version 4.3.2 or later (5765-C34) 

• C for AIX, Version 4.3 or later 

• C Set and C Set ++ for AIX, Version 3.6 or later 

• Performance Toolbox for AIX, Manager Component, Version 2.2 
(5765-654) 

Software requirements for PSSP Version 3.1 on AIX 4.3.2, or later: 

• AIX Version 4.3.2 or later (5765-C34) 

• C for AIX, Version 4.3 or later 

• C Set and C Set ++ for AIX, Version 3.6 or later 

• Performance Toolbox for AIX, Manager Component, Version 2.2 
(5765-654) 

AIX 4.3 and its relation to PSSP: 

• 32-bit or 64-bit application coexistence and concurrent execution for those 
who plan to implement 64-bit technology in SP system 

• An Internet- and intranet-ready operating environment 

• Online HTML-based publications 

• Support for multiple authentication methods within the AIX remote 
commands 

• The Network Installation Management (NIM) component of AIX supports 
Distributed Computing Environment (DCE) authentication for remote 
commands 

• Supports PSSP 3.1 for AIX 4.3.2 


2.13 System partitioning 

In a switched SP, the switch chip is the basic building block of a system 
partition. If a switch chip is placed in the system partition, then any nodes 
connected to that chip’s node switch ports are members of that partition. Any 
system partition in a switched SP is comprised physically of the switch chip, 
any nodes attached to ports on those chips, and links that join those nodes 
and chips. 

A system partition can be no smaller than a switch chip and the nodes 
attached to it, and those nodes would occupy some number of slots in the 
frame. The location of the nodes in the frame and their connection to the 
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chips is a major consideration if you are planning on implementing system 
partitioning. 

Switch chips connect alternating pairs of slots in the frame. Switch 
boundaries are: 

Nodes 1,2, 5, 6 

Nodes 3, 4, 7, 8 

Nodes 9, 10, 13, 14 

Nodes 11, 12, 15, 16 

For a single frame system with 16 slots, the possible systems partitioning the 
number of slots per partition are: 

One system partition: 16 

Two system partitions: 12-4 or 8-8 

Three system partitions: 4-4-8 

Four system partitions: 4-4-4-4 

System partitioning is shown in Figure 29. 



Figure 29. System partitioning 
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2.14 Configuration rules 

The RS/6000 SP system has extremely wide scalability. For standard 
configuration, the RS/6000 SP system mounts up to 128 processor nodes. 
This section provides you with information on how you can expand your SP 
system and what kind of configuration fits your requirement. Also, in this 
section, we provide a set of rules and sample configurations to introduce you 
to the design of more complex SP configurations. You may use these 
configuration rules as a checklist when you configure your SP system. 

This section uses the following current node, frame, switch, and switch 
adapter types to configure SP systems: 

Nodes 

• 160 MHz Thin node (feature code #2022). 

• 332 MHz SMP Thin node (feature code #2050). 

• 332 MHz SMP Wide node (feature code #2051). 

• POWER3 SMP Thin node (feature code #2052). 

• POWER3 SMP Wide node (feature code #2053). 

• 200 MHz SMP High node (feature code #2009). This node is withdrawn 
from marketing. 

• RS/6000 Server Attached node (feature code #9122). 

Frames 

• Short model frame (model 500) 

• Tall model frame (model 550) 

• Short expansion frame (feature code #1500) 

• Tall expansion frame (feature code #1550) 

• SP Switch frame (feature code #2031) 

• RS/6000 server frame (feature code #9123) 

Switches 

• SP Switch-8 (8-port switch, feature code #4008) 

• SP Switch (16-port switch, feature code #4011) 

Switch Adapter 

• SP Switch Adapter (feature code #4020) 
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• SP Switch MX adapter (feature code #4022) 

• SP Switch MX2 adapter (feature code #4023) 

• SP System attachment adapter (feature code #8396) 

The RS/6000 SP configurations are very flexible. Several types of processor 
nodes can be intermixed within a frame. However, there are some basic 
configuration rules that come into place. 


- Configuration Rule 1- 

The tall frame and short frames cannot be mixed within an SP system. 


All frames in an SP configuration must either be tall frames or short frames 
but not a mixture of both. An SP Switch frame is classified as a tall frame. You 
can use an SP Switch frame with tall frame configurations. 

- Configuration Rule 2 - 

If there is a single PCI Thin node in a drawer, it must be installed in the odd 
slot position (left side of the drawer). 


With the announcement of the POWER3 SMP nodes in 1999, a single PCI 
Thin node is allowed to be mounted in a drawer. In this circumstance, it must 
be installed in the odd slot position (left side). This is because the lower slot 
number is what counts when a drawer is not fully populated. Moreover, 
different PCI Thin nodes are allowed to be mounted in the same drawer, such 
as you can install a POWER3 SMP Thin node in the left side of a drawer and 
a 332 MHz Thin node in the right side of the same drawer. 

Based on the configuration rule 1, the rest of this section is separated into 
two major parts. The first part provides the configuration rule for using short 
frames, and the second part provides the rules for using tall frames. 

2.14.1 Short frame configurations 

Short frames can be developed into two kind of configurations: Non-switched 
and switched configurations. The supported switch for short frame 
configurations is the SP Switch-8. Only one to eight internal nodes can be 
mounted in short frame configurations. The SP-Attached servers are not 
supported in short frame configurations. Additional to configuration rule 2, a 
single PCI Thin node must be the last node in a short frame. 
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- Configuration Rule 3 - 

A short model frame must be completely full before a short expansion 
frame can mount nodes. You are not allowed any imbedded empty drawers. 


2.14.1.1 Non-switched Short Frame Configurations 

This configuration does not have a switch and mounts one to eight nodes. A 
minimum configuration is formed by one short model frame and one PCI Thin 
node, or one Wide node, or one High node, or one pair of MCA thin nodes as 
shown in Figure 30 on page 57. 



Figure 30. Minimum non-switched short frame configurations 

The short model frame must be completely full before the short expansion 
frame can mount nodes as shown in Figure 31 



Figure 31. Example of non-switched short frame configuration 

2.14.1.2 SP Switch-8 short frame configurations 

This configuration mounts one to eight nodes and connects through a single 
SP Switch-8. These nodes are mounted in one required short model frame 
containing the SP Switch-8 and additional non-switched short expansion 
frames. Each node requires supported SP Switch adapters. Nodes in the 
non-switched short expansion frames share unused switch ports in the short 
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model frame. Figure 32 shows the example of maximum SP Switch-8 short 
frame configurations. 

- Configuration Rule 4 - 

A short frame supports only a single SP Switch-8 board. 



Figure 32. Maximum SP Switch-8 short frame configurations 



High Node 

Thin 

Thin 

Wide Node 

SP Switch-8 




PCI Thin 


Wide Node 

Thin 

Thin 


2.14.2 Tall frame configurations 

The tall frame offers several configurations, and it is more flexible than the 
short frame. The SP-Attached servers are supported in tall frame 
configurations. There are four kinds of tall frame configurations based on the 
switch type. 

1. Non-switched configuration 

2. SP Switch-8 configuration 
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3. Single stage SP Switch configuration 

4. Two stage SP Switch configuration 

- Configuration Rule 5 - 

Tall frames support SP-Attached servers. 


2.14.2.1 Non-switched tail frame configurations 

This configuration does not have a switch. A minimum configuration is formed 
by one tall model frame and a single PCI thin node, or one Wide node, or one 
High node, or one pair of MCA thin nodes. In contrast to the short frame 
configuration, the tall expansion frame can mount nodes even when the 
model frame has some empty drawers. It provides more flexibility in adding 
more nodes in the future. 

2.14.2.2 SP Switch-8 tall frame configurations 

This configuration mounts one to eight nodes and connects through a single 
SP Switch-8. A minimum configuration is formed by one tall model frame 
equipped with an SP-Switch-8 and single PCI thin node, or one Wide node, or 
one High node, or one pair of MCA thin nodes. Each node requires a 
supported SP Switch adapter. A non-switched tall expansion frame may be 
added, and nodes in a expansion frame share unused switch ports in the 
model frame. You are not allowed any imbedded empty drawers. Again, if 
there is a single PCI Thin node in a drawer, it must be placed at the last node 
in a frame. Figure 33 on page 60 shows example of SP Switch-8 tall frame 
configurations. 
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Figure 33. Example of SP Switch-8 Tall Frame Configurations 


2.14.2.3 Single stage SP Switch configurations 

This probably is the most common SP configuration. It provides both 
scalability and flexibility. This configuration can mount one to eighty processor 
nodes in one required tall model frame with an SP Switch and additional 
switched and/or non-switched expansion frames. A minimum configuration is 
formed by one tall model frame equipped with an SP Switch and single PCI 
thin node, or one Wide node, or one High node, or one pair of MCA thin 
nodes. Each node requires a supported SP Switch adapter. Empty drawers 
are allowed in this configuration. 

Single stage SP Switch with single SP Switch configurations 

If your SP system has no more than 16 nodes, a single SP Switch is enough. 
In this circumstance, non-switched expansion may be added depending on 
the number of nodes and node locations (see 2.15.4, “The switch port 
numbering rule” on page 67 and Figure 39 on page 69). 

Figure 34 on page 62 shows an example of single stage SP Switch 
configuration with no more than 16 nodes. 

In configuration (a), four Wide nodes and eight Thin nodes are mounted in a 
tall model frame equipped with an SP Switch. There are four available switch 
ports that you can use to attach SP-Attached servers or SP Switch routers. 
Expansion frames are not supported in this configuration because there are 
Thin nodes on the right side of the model frame. 
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- Configuration Rule 6 - 

If a model frame on switched expansion frame has Thin nodes on the right 
side, it cannot support non-switched expansion frames. 


In configuration (b), six Wide nodes and two PCI Thin nodes are mounted in a 
tall model frame equipped with an SP Switch. There also is a High node, two 
Wide nodes, and four PCI Thin nodes mounted in a non-switched expansion 
frame. Note that all PCI Thin nodes on the model frame must be placed on 
the left side to comply with configuration rule 6. All Thin nodes on a 
expansion frame are also placed on the left side to comply with the switch 
port numbering rule. There is one available switch port that you can use to 
attach SP-Attached servers or SP Switch routers. 

In configuration (c), there are eight Wide nodes mounted in a tall model frame 
equipped with an SP Switch and four High nodes mounted in a non-switched 
expansion frame (frame 2). The second non-switched expansion frame (frame 
3) is housed in a High node, two Wide nodes, and one PCI Thin node. This 
configuration occupies all 16 switch ports in the model fame. Note that Wide 
nodes and PCI Thin nodes in frame 3 have to be placed on High node 
locations. 

Now you try to describe the configuration (d). If you want to add two POWER3 
Thin nodes, what would be the locations? 

A maximum of three non-switched expansion frames can be attached to each 
model frame and switched expansion frame. 
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Figure 34. Example of single SP Switch configurations 

Single stage with multiple SP Switch configurations 
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If your SP system has 17 to 80 nodes, switched expansion frames are 
required. You can add switched expansion frames and non-switched 
expansion frames. Nodes in the non-switched expansion frame share unused 
switch ports that may exist in the model frame and in the switched expansion 
frames. Figure 35 shows an example of a Single Stage SP Switch with both 
switched and non-switched expansion frame configurations. There are four 
SP Switches; each can support up to 16 processor nodes. Therefore, this 
example configuration can mount a maximum of 64 nodes. 
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Figure 35. Example of a multiple SP Switch configuration 

2.14.2.4 Two stage SP Switch configurations 

This configuration requires an SP Switch frame that forms the second 
switching layer. A minimum of 24 processor nodes are required to make this 
configuration work. It supports up to 128 nodes. Each node requires a 
supported SP Switch adapter. These nodes are mounted in one required tall 
model frame equipped with an SP Switch and at least one switched 
expansion frame. The SP Switch in these frames forms the first switching 
layer. The SP Switch frame is also required if you want more than 80 nodes or 
more than four switched expansion frames. This configuration can utilize both 
switched and non-switched expansion frames as well. Nodes in the 
non-switched expansion frame share unused switch ports that may exist in 
the model frame. 
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Figure 36. Example of two stage SP Switch configuration 


2.15 Numbering rules 

In order to place nodes in an SP system, you need to know the following 
numbering rules: 

• The frame numbering rule 

• The slot numbering rule 

• The node numbering rule 

• The SP Switch port numbering rule 

2.15.1 The frame numbering rule 

The administrator establishes the frame numbers when the system is 
installed. Each frame is referenced by the tty port to which the frame 
supervisor is attached and is assigned a numeric identifier. The order in 
which the frames are numbered determines the sequence in which they are 
examined during the configuration process. This order is used to assign 
global identifiers to the switch ports, nodes, and SP Expansion I/O Units. This 
is also the order used to determine which frames share a switch. 
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If you have an SP Switch frame, you must configure it as the last frame in 
your SP system. Assign a high frame number to an SP Switch frame to allow 
for future expansion. 

2.15.2 The slot numbering rule 

A tall frame contains eight drawers that have two slots each for a total of 16 
slots. A short frame has only four drawers and eight slots. When viewing a tall 
frame from the front, the 16 slots are numbered sequentially from bottom left 
to top right. 

The position of a node in an SP system is sensed by the hardware. That 
position is the slot to which it is wired and is the slot number of the node. 

• A thin node occupies a single slot in a drawer, and its slot number is the 
corresponding slot. 

• A Wide node occupies two slots, and its slot number is the 
odd-numbered slot. 

• A High node occupies four consecutive slots in a frame. Its slot number 
is the first (lowest number) of these slots. 

Figure 37 on page 66 shows slot numbering for tall frames and short frames. 

An SP-Attached server is managed by the PSSP components as it is in a 
frame of its own. However, it does not enter into the determination of the 
frame and switch configuration of your SP system. It has the following 
additional characteristics: 

• It is the only node in its frame. It occupies slot number 1 but uses the 
full 16 slot numbers. Therefore, 16 is added to the node number of the 
SP-Attached server to get the node number of the next node. 

• It cannot be the first frame. 

• It connects to a switch port of a model frame or a switched expansion 
frame. 

• It cannot be inserted between a switched frame and any non-switched 
expansion frame using that switch. 


Chapter 2. Validate hardware and software configuration 65 




2.15.3 The node numbering rule 

A node number is a global ID assigned to a node. It is the primary means by 
which an administrator can reference a specific node in the system. Node 
numbers are assigned for all nodes including SP-Attached servers regardless 
of node or frame type. Replace node number with expansion number for the 
global ID of an SP Expansion I/O Unit. Global IDs are assigned using the 
following formula: 

node_number = ((frame_number - 1) x 16) + slot_number 

where slot_number is the lowest slot number occupied by the node or unit. 
Each type (size) of node occupies one slot or a consecutive sequence of 
slots. For each node, there is an integer n such that a thin node or expansion 
unit occupies slot n, a Wide node occupies slots n, n+1, and a High node 
occupies n, n+1, n+2, n+3. An SP-Attached server is considered to be one 
node in one frame. For single thin nodes (not in a pair), Wide nodes, and High 
nodes, n must be odd. For an SP-Attached server, n is 1. Use n in place of 
slot_number \n the formula. 

Node numbers are assigned independent of whether the frame is fully 
populated. Figure 38 on page 67 demonstrates node numbering. Frame 4 
represents an SP-Attached server in a position where it does not interrupt the 
switched frame and companion non-switched expansion frame configuration. 
It can use a switch port on frame 2, which is left available by the High nodes 
in frame 3. Its node number is determined by using the previous formula. 
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Figure 38. Node numbering for an SP system 


2.15.4 The switch port numbering rule 

In a switched system, the switch boards are attached to each other to form a 
larger communication fabric. Each switch provides some number of ports to 
which a node can connect (16 ports for an SP Switch and 8 ports for the SP 
Switch-8.) In larger systems, additional switch boards (intermediate switch 
boards) in the SP Switch frame are provided for switch board connectivity; 
such boards do not provide node switch ports. 

Switch boards are numbered sequentially starting with 1 from the frame with 
the lowest frame number to that with the highest frame number. Each full 
switch board contains a range of 16 switch port numbers (also known as 
switch node numbers) that can be assigned. These ranges are also in 
sequential order with their switch board number. For example, switch board 1 
contains switch port numbers 0 through 15. 

Switch port numbers are used internally in PSSP software as a direct index 
into the switch topology and to determine routes between switch nodes. 

Switch port numbering for an SP Switch 

The SP Switch has 16 ports. Whether a node is connected to a switch within 
its frame or to a switch outside of its frame, you can use the following formula 
to determine the switch port number to which a node is attached: 
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switch_port_number = ((switch_number - 1) x 16) + switch_port_assigned 

where switch_number is the number of the switch board to which the node is 
connected, and switch_port_assigned is the number assigned to the port on 
the switch board (0 to 15) to which the node is connected. 

Figure 39 on page 69 shows the frame and switch configurations that are 
supported and the switch port number assignments in each node. Let us 
describe more details on each configuration. 

In configuration 1, the switched frame has an SP Switch that uses all 16 of its 
switch ports. Since all switch ports are used, the frame does not support 
non-switched expansion frames. 

If the switched frame has only Wide nodes, it could use, at most, eight switch 
ports and, therefore, has eight switch ports to share with non-switched 
expansion frames. These expansion frames are allowed to be configured as 
in configuration 2 or configuration 3. 

In configuration 4, four High nodes are mounted in the switched frame. 
Therefore, its switch can support 12 additional nodes in non-switched 
expansion frames. Each of these non-switched frames can house a maximum 
of four High nodes. If Wide nodes are used, they must be placed in the High 
node slot positions. 

A single PCI Thin node is allowed to be mount in a drawer. Therefore, it is 
allowed to mount in non-switched expansion frames. In this circumstance, it 
must be installed in the Wide node slot positions (configuration 2) or High 
node slot positions (configuration 3 and 4). 
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Figure 39. Switch port numbering for an SP Switch 
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Switch port numbering for an SP Switch-8 

Node numbers for short and tall frames are assigned using the same 
algorithm. See “Node Numbering Rule” on page 66. 

An SP system with SP switch-8 contains only switch port numbers zero 
through seven. The following algorithm is used to assign nodes their switch 
port numbers in systems with eight port switches: 

1. Assign the node in slot 1 to switch_port_number = 0. Increment 

switch_port_number by 1. 

2. Check the next slot. If there is a node in the slot, assign it the current 
switch_port_number then increment the number by 1. 

Repeat until you reach the last slot in the frame or switch port number 
7, whichever comes first. 

Figure 40 shows sample switch port numbers for a system with a short frame 
and an SP Switch-8. 
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Figure 40. Example of switch port numbering for an SP Switch-8 


2.16 Related documentation 

These documents will help you understand the concepts and examples 
covered in this guide in order to maximize your chances of success in the 
exam. 

SP Manuals 

The book RS/6000 SP: Planning Vol 1, Hardware and Physical Environment , 
GA22-7280, is a helpful hardware reference. It is included here to help you 
select nodes, frames, and other components needed and ensures that you 
have the correct physical configuration and environment. 
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RS/6000: Planning Volume 2, GA22-7281, is a good reference to help plan 
and make decisions about what components to install and also which nodes, 
frames, and switches to use depending on the purpose. 

332 MHz Thin and Wide Node Service, GA22-7330. The book explains the 
configuration of 332 MHz Thin and Wide nodes. 

SP Redbooks 

Inside the RS/6000 SP, SG24-5145, serves as an excellent reference for 
understanding the various SP system configurations you could have. 

RS/6000 SP Systems Handbook, SG24-5596, is a comprehensive guide 
dedicated to the RS/6000 SP product line. Major hardware and software 
offerings are introduced and their prominent functions discussed. 


2.17 Sample questions 

This section provides a series of questions to help aid you in preparation for 
the certification exam. The answers to these questions can be found in 
Appendix A. 

1. The SP Switch router node is an extension node. It can support multiple 
switch adapter connections for higher availability and performance. Which 
of the following is not a requirement of extension nodes? 

A. CWS 

B. PSSP 2.4 or higher on Primary node 

C. Primary node 

D. Backup node 

2. Which of the following is not a true statement regarding the capability of an 
SP Switch over a High Performance Switch? 

A. Fault isolation 

B. Compatible with older HiPS Switches 

C. Improved bandwidth 

D. Higher availability 

3. Which is a minimum prerequisite for PSSP Version 3 release 1? 

A. AIX Version 4.3.2 

B. IBM C for AIX, Version 4.3 

C. Performance Toolbox Parallel Extensions (PTPE) 
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D. IBM Performance Toolbox, Manager Component, Version 2.2 

4. A customer is upgrading an existing 200 MHz High node to the new 332 
MHz SMP Thin node. The SP system contains an SP Switch. How many 
available adapter slots will the customer have on the new node? 

A. Two PCI slots. The Ethernet is integrated, and the SP Switch has a 
dedicated slot. 

B. Eight PCI slots. Two slots are used by an Ethernet adapter and the SP 
Switch adapter. 

C. Ten PCI slots. The Ethernet is integrated, and the SP Switch has a 
dedicated slot. 

D. Nine PCI slots. The Ethernet is integrated, and the SP Switch adapter 
takes up one PCI slot. 

5. The SP-Attached server requires connectivity with the SP system in order 
to establish a functional and safe network? How many connections are 
required? 

A. 1 

B. 3 

C. 4 

D. 6 

6. Which of the following is NOT a function provided by the control 
workstation? 

A. Authentication Service 

B. File Collection Service 

C. Boot/lnstall Service 

D. Ticket Granting Service 

7. Which of the following is a minimum prerequisite for the CWS hardware? 

A. Two color graphic adapters 

B. SP Ethernet adapters for connections to the SP Ethernet 

C. 2 GB of disk storage 

D. 1 GB of main memory 

8. Which of the following is NOT a component of the SP Switch Network? 

A. SP Switch adapter 

B. SP Switch board 
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C. SP Switch port 

D. SP Switch router 

9. A customer adds the second frame to their SP. A decision is made to 
configure a boot/install server(s) to off load the CWS. Which of the 
following is the minimum requirement for a boot/install server to off load 
the control workstation? 

A. A boot /install server for each node type 

B. A boot/install server of each node type 

C. A boot/install server in either framel or frame2 

D. A boot/install server in each frame 

10. Which of the following statements regarding configuration rules is correct? 

A. The tall frame and short frames can be mixed within an SP system. 

B. A short frame supports only a single SP Switch-8 board. 

C. Tall frames does not support SP-Attached servers. 

D. If there is a single PCI Thin node in a drawer, it must be installed in the 
even slot position (right side of the drawer). 


2.18 Exercises 

Here are some exercises that you may wish to perform: 

1. Utilizing the study guide test environment (Figure 1) on page 3, describe 
the necessary steps to add an SP-attached server to the current 
environment. 

2. Refer to the study guide test environment for the following exercise: 
Describe the necessary steps to add a third switch frame with one SMP High 
node and two SMP Thin nodes to the current environment. 

3. What are the necessary steps to make the first node on the second frame a 
boot/install server? Refer to study guide test environment (Figure 1 on page 
three). Assume that the first node on frame one is already a BIS. 

4. Describe the configuration of figure (d) on page 62. Assume that you need 
to add two POWER3 Thin nodes, what would be the locations? 

5. The SP Switch router requires a minimum of three connections with your 
SP system. What are the required connections? 
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Chapter 3. RS/6000 SP networking 


This chapter covers the networking issues of an RS/6000 SP system. It 
discusses the different name resolution mechanisms you have available on 
the SP as well as Ethernet segmentation and routing. Network topology and 
the impact on the RS/6000 SP subsystems are also discussed. 


3.1 Key concepts you should study 

The concepts explained in this section will give you a good preparation for the 
networking related questions in RS/6000 SP certification exam. In order to 
maximize your chances, you should become familiar with: 

• How to create specific hostnames, TCP/IP address, Netmask value, and 
default routes. 

• How to determine the name resolution mechanism, such as host table, 
DNS, or NIS, that better fit your needs. 

• How to determine the Ethernet topology, segmentation, and routing in the 
SP System. 


3.2 Name, address, and network integration planning 

You must assign IP addresses and host names for each network connection 
on each node and on the control workstation in your SP system. Because you 
probably want to attach the SP system to your site networks, you need to plan 
how to do this. You need to decide what routers and gateways you will use, 
what default and network routes you need on your nodes, and how you will 
establish these default and network routes. 

You need to ensure that all of the addresses you assign are unique within 
your site network and within any outside networks to which you are attached, 
such as the Internet. Also, you need to plan how names and addresses will 
be resolved on your systems (that is, using DNS name servers, NIS maps, 
/etc/hosts files, or some other method). 

3.2.1 Set host name 

Independent of any of the network adapters, each machine has a host name. 
Usually, the host name is the name given to one of the network adapters in 
the machine. We need to set the host name on the control workstation. 

A sample of smit hostname is shown in Figure 41 on page 76. 


© Copyright IBM Corp. 2000 
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Set Hostname 

Please refer to Help for information 
concerning hostname / INTERNET address mapping 

Type or select values in entry fields. 

Press Enter AFTER making all desired changes. 

[Entry Fields] 

* HOSTNAME (symbolic name of your machine) [sp3n0] 


Fl=Help F2=Refresh F3=Cancel F4=List 

F5=Reset F6-Conmand F7=Edit F8=Image 

F9=Shell F10=Exit Enter=Do 


Figure 41. Set the host name on the control workstation 

To set the host name in control workstation, issue the smit fast path: 

smit hostname 

In SP systems, this is known as the Initial Hostname. 

3.2.2 Set IP address and netmask 

You will need at least one Ethernet subnet for your system. You will need an 
IP address per node and control workstation. 

Each network adapter needs to have a specific IP address. To set an IP 
address to an adapter on the control workstation, enter the smit fastpath: 

smit mktcpip 

You select the network adapter you want to configure and fill in the IP 
address and netmask assigned for this adapter. Please be sure that you have 
the correct combination of IP address and netmask. The netmask can be 
defined based on the IP address class. A sample of smit mktcpip is shown in 
Figure 42 on page 77. 
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Minimum Configuration & Startup 

To Delete existing configuration data, please use Further Configuration menus 

Type or select values in entry fields. 

Press Enter AFTER making all desired changes. 


* HOSTNAME 

* Internet ADDRESS (dotted decimal) 

Network MASK (dotted decimal) 

* Network INTERFACE 
NAMESERVER 

Internet ADDRESS (dotted decimal) 
DOMAIN Name 

Default GATEWAY Address 
(dotted decimal or symbolic name) 

Your CABLE Type 
START Now 


[Entry Fields] 

[sp3n0] 

[192.168.3.130] 

[255.255.255.0] 

enO 

[] 

[] 

[9.12.0.1] 

bnc + 

no + 


Fl=Help 

F5=Reset 

F9=Shell 


F2=Refresh 
F6=Command 
F10=Exit 


F3=Cancel 

F7=Edit 

Enter=Do 


F4=List 

F8=Image 


Figure 42. Set IP address and netmask on the control workstation 

The enO adapter (first Ethernet adapter) on the nodes needs to be configured 
with an IP address and a name. This name is known as the Reliable 
Hostname. The control workstation and several subsystems (such as 
Kerberos) will use this Reliable Hostname and the enO adapter for 
communication. 

3.2.3 Set routes 

If you have different subnet in your network, it is very important that you give 
a specific route from your SP system to this subnet. By defining a route, you 
basically show this node’s adapter and how to get to the other subnet through 
the gateway selected. The gateway is the IP address that is able to reach the 
other subnets. 

Routing is very important in RS/6000 SP environments. PSSP supports 
multiple subnets, but all the nodes need to be able to access those subnets if 
nodes in the same partition reside there. Every node must have access to the 
control workstation even when it is being installed from a boot/install server 
other than the control workstation. 

Before configuring boot/install servers for other subnets, make sure the 
control workstation has routes defined to reach each one of the additional 
subnets. 
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To set up static routes, you may use smit or the command line. To add routes 
using the command line, use the route command: 

route add -net <ip_address_of_other_networkxip_address_of_gateway> 

where: 

<ip_address_of_other_network> is the IP address of the other network in your 
LAN. 

<ip_address_of_gateway> is the IP address of the gateway. 

For example: 

route add -net 192.168.15 -netmask 255.255.255.0 9.12.0.130 

A sample of smit mkroute is shown in Figure 43. 


Add Static Route 

Type or select values in entry fields. 

Press Enter AFTER making all desired changes. 

[Entry Fields] 


Destination TYPE 

net 

+ 

* DESTINATION Address 

(dotted decimal or symbolic name) 

[192.168.15.0] 


* Default GATEWAY Address 

(dotted decimal or symbolic name) 

[9.12.0.130] 


* METRIC (number of hops to destination gateway) 

[1] 

# 

Network MASK (hexadecimal or dotted decimal) 

[255.255.255.0] 


Fl=Help F2=Refresh F3-Cancel 

F4=List 


F5=Reset F6-Conmand F7=Edit 

F9=Shell F10=Exit Enter=Do 

F8-Image 



Figure 43. Adding a route using SMIT mkroute 


3.2.4 Host name resolution 

TCP/IP provides a naming system that supports both flat and hierarchical 
network organization so that users can use meaningful, easily remembered, 
names instead of 32-bit addresses. 
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In flat TCP/IP networks, each machine on the network has a file (/etc/hosts) 
containing the name-to-lnternet-address mapping information for every host 
on the network. 

When TCP/IP networks become very large, as on the Internet, naming is 
divided hierarchically. Typically, the divisions follow the network’s 
organization. In TCP/IP, hierarchical naming is known as the domain name 
service (DNS) and uses the DOMAIN protocol. The DOMAIN protocol is 
implemented by the named daemon in TCP/IP. 

The default order in resolving host names is: 

1. BIND/DNS (named) 

2. Network Information Service (NIS) 

3. Local /etc/hosts file 

The default order can be overwritten by creating a configuration file, called 
/etc/netsvc.conf, and specifying the desired order. Both default and 
/etc/netsvc.conf can be overwritten with the environment variable nsorder. 

The /etc/resolv.conf file 

The /etc/resolv.conf file defines the domain and name server information for 
local resolver routines. If the /etc/resolv.conf file does not exist, then 
BIND/DNS is considered not to be set up or running. The system will attempt 
name resolution using the local /etc/hosts file. 

A Sample /etc/resolv.conf file is: 

# cat /etc/resolv.conf 
domain msc.itso.ibm.com 
search msc.itso.ibm.com itso.ibm.com 
nameserver 9.12.1.30 

In this sample, there is only one name server defined with an address of 
9.12.1.30. The system will query this domain name server for name 
resolution. The default domain name to append to names that do not end with 
a . (period) is msc.itso.ibm.com. The search entry when resolving a name is 

msc.itso.ibm.com and itso.ibm.com 


3.2.5 NIS 

NIS’ main purpose is to centralize administration of files, such as 
/etc/passwd, within a network environment. 

NIS separates a network into three components: Domain, server(s), and 
clients. 
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A NIS domain defines the boundary where file administration is carried out. In 
a large network, it is possible to define several NIS domains to break the 
machines up into smaller groups. This way, files meant to be shared among 
five machines, for example, stay within a domain that includes the five 
machines not all the machines on the network. 

A NIS server is a machine that provides the system files to be read by other 
machines on the network. There are two types of servers: Master and slave. 
Both keep a copy of the files to be shared over the network. A master server 
is the machine where a file may be updated. A slave server only maintains a 
copy of the files to be served. A slave server has three purposes: 

1. To balance the load if the master server is busy. 

2. To back up the master server. 

3. To enable NIS requests if there are different networks in the NIS domain. 
NIS client requests are not handled through routers; such requests go to a 
local slave server. It is the NIS updates between a master and a slave 
server that goes through a router. 

A NIS client is a machine that has to access the files served by the NIS 
servers. 

There are four basic daemons that NIS uses: ypserv, ypbind, yppasswd, and 
ypupdated. NIS was initially called yellow pages; hence, the prefix yp is used 
for the daemons. They work in the following way: 

• All machines within the NIS domain run the ypbind daemon. This daemon 
directs the machine’s request for a file to the NIS servers. On clients and 
slave servers, the ypbind daemon points the machines to the master 
server. On the master server, its ypbind points back to itself. 

• ypserv runs on both the master and the slave servers. It is this daemon 
that responds to the request for file information by the clients. 

• yppasswd and ypupdated run only on the master server. The yppasswd 
makes it possible for users to change their login passwords anywhere on 
the network. When NIS is configured, the /bin/passwd command is linked 
to the /usr/bin/yppasswd command on the nodes. The yppasswd command 
sends any password changes over the network to the yppasswd daemon 
on the master server. The master server changes the appropriate files and 
propagates this change to the slave servers using the ypupdated daemon. 
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Note 


NIS serves files in the form of maps. There is a map for each of the files 
that it serves. Information from the file is stored in the map, and it is the 
map that is used to respond to client requests. 


By default, the following files are served by NIS: 

• /etc/ethers 

• /etc/group 

• /etc/hosts 

• /etc/netgroup 

• /etc/networks 

• /etc/passwd 

• /etc/protocols 

• /etc/publickey 

• /etc/rpc 

• /etc/security/group 

• /etc/security/passwd 

• /etc/services 

Tip 

By serving the /etc/hosts file, NIS has an added capability for handling 
name resolution in a network. Please refer to NIS and NFS publication by 
O’Reilly and Associates for detailed information. 


To configure NIS, there are four steps, all of which can be done through SMIT. 
For all four steps, first run smit nfs and select Network Information Service 
(NIS) to access the NIS panels, then: 

• Choose Change NIS Domain Name of this Host to define the NIS 
Domain. Figure 44 on page 82 shows what this SMIT panel looks like. In 
this example, SPDomain has been chosen as the NIS domain name. 
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Figure 44. SMIT panel for setting a NIS domain name 

• On the machine that is to be the NIS master (for example, the control 
workstation), select Configure/Modify NIS and then Configure this Host 
as a NIS Master Server. Figure 45 on page 83 shows the SMIT panel. Fill 
in the fields as required. Be sure to start the yppasswd and ypupdated 
daemons. When the SMIT panel is executed, all four daemons ypbind, 
ypserv, yppasswd, and ypupdated are started on the master server. This 
SMIT panel also updates the NIS entries in the local /etc/rc.nfs file(s). 
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Figure 45. SMIT panel for configuring a master server 

• On the machines set aside to be slave servers, go to the NIS SMIT panels 
and select Configure this Host as a NIS Slave Server. Figure 46 on 
page 84 shows the SMIT panel for configuring a slave server. This step 
starts the ypserv and ypbind daemons on the slave servers and updates 
the NIS entries in the local /etc/rc.nfs file(s). 
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Figure 46. SMITpanel for configuring a slave server 

• On each node that is to be a NIS client, go into the NIS panels and select 
Configure this Host as a NIS Client. This step starts the ypbind daemon 
and updates the NIS entries in the local /etc/rc.nfs file(s). 
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Figure 47. SMITpanel for configuring a NIS client 


Once configured, when there are changes to any of the files served by NIS, 
their corresponding maps on the master are rebuilt and either pushed to the 
slave servers or pulled by the slave servers from the master server. These 
are done through the SMIT panel or the make command. To access the SMIT 
panel, select Manage NIS Maps within the NIS panel. Figure 48 on page 86 
shows this SMIT panel. 
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Figure 48. SMIT panel for managing NIS maps 

Select Build/Rebuild Maps for this Master Server, and then either have the 
system rebuild all the maps with the ail option or specify the maps that you 
want to rebuild. After that, return back to the SMIT panel as shown n Figure 
48 on page 86, and select either Transfer Maps to Slave Servers (from the 
master server) or Retrieve Maps from Master Server to this Slave (from a 
slave server). 


3.2.6 DNS 

DNS (Domain Name Server) is the way that host names are organized on the 
Internet using TCP/IP. Host names are used to look up or resolve the name 
we know a system as and convert it to a TCP/IP address. All of the movement 
of data on a TCP/IP network is done using addresses, not host names; so, 
DNS is used to make it easy for people to manage and work with the 
computer network. 

If your SP system has a site with many systems, you can use DNS to 
delegate the responsibility for name systems to other people or sites. You 
can also reduce your administration workload by only having to update one 
server in case you want to change the address of the system. 


86 IBM Certification Study Guide RS/6000 SP 






DNS uses a name space in a similar way to the directories and subdirectories 
we are used to. Instead of a between names to show that we are going to 
the next level down, DNS uses a period or full stop. 

In the same way as is the root directory for UNIX, DNS has "." as the root 
of the name space. Unlike UNIX, if you leave out the full stop or period at the 
end of the DNS name, DNS will try various full or partial domain names for 
you. One other difference is that, reading left to right, DNS goes from the 
lowest level to the highest; whereas, the UNIX directory tree goes from the 
highest to the lowest. 

For example, the domain ibm.com is subdomain of the .com domain. The 
domain itso.ibm.com is subdomain of the ibm.com domain and the .com 
domain. 

You can set up your SP system without DNS. This uses a file called /etc/hosts 
on each system to define the mapping from names to TCP/IP addresses. 
Because each system has to have copy of the /etc/hosts file, this becomes 
difficult to maintain for even a small number of systems. Even though setting 
up DNS is more difficult initially, the administrative workload for three or four 
workstations may be easier than with /etc/hosts. Maintaining a network of 20 
or 30 workstations becomes just as easy as for three or four workstations. It 
is common for an SP system implementation to use DNS in lieu of /etc/hosts. 

When you set up DNS, you do not have to match your physical network to 
your DNS setup, but there are some good reasons why you should. Ideally, 
the primary and secondary name servers should be the systems that have the 
best connections to other domains and zones. 


3.3 The SP networks 

You can connect many different types of LANs to the SP system, but 
regardless of how many you use, the LANs fall into one of the following 
categories. 

3.3.1 SP Ethernet 

The SP requires an Ethernet connection between the control workstation and 
all nodes, which is used for network installation of the nodes and for system 
management. In order for PSSP installation to function, you must connect the 
SP Ethernet to the Ethernet adapter in the SP node’s lowest hardware slot of 
all Ethernet adapters on that node. When a network node is booted, it will 
select the lowest Ethernet adapter from which it will perform the install. This 
Ethernet adapter must be on the same subnet of an Ethernet adapter on the 
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node’s boot/install server. In nodes that have one, the integrated Ethernet 
adapter is always the lower Ethernet adapter. Be sure to maintain this 
relationship when adding Ethernet adapters to a node. This section describes 
the setup of that administrative Ethernet, which is often called the SP LAN. 

3.3.1.1 Frame and node cabling 

SP frames include coaxial Ethernet cabling for the SP LAN, also known as 
thin-wire Ethernet or 10BASE-2. All nodes in a frame can be connected to 
that medium through the BNC connector of either their integrated 10 Mbps 
Ethernet or a suitable 10 Mbps Ethernet adapter using T-connectors. Access 
to the medium is shared among all connected stations and controlled by 
Carrier Sense, Multiple Access/Collision Detect (CSMA/CD). 10BASE-2 only 
supports half duplex (HDX). There is a hard limit of 30 stations on a single 
10BASE-2 segment, and the total cable length must not exceed 185 meters. 
However, it is not advisable to connect more than 16 to 24 nodes to a single 
segment. Normally, there is one segment per frame, and one end of the 
coaxial cable is terminated in the frame. Depending on the network topology, 
the other end connects the frame to either the control workstation or to a 
boot/install server in that segment and is terminated there. In the latter case, 
the boot/install server and CWS are connected through an additional Ethernet 
segment; so, the boot/install server needs two Ethernet adapters. 

It is also possible to use customer-provided Unshielded Twisted Pair (UTP) 
cabling of category 3, 4, or 5. An UTP cable can be directly connected to the 
RJ-45 Twisted Pair (TP) connector of the Ethernet adapter if one is available 
or through a transceiver/media converter to either the AUI or BNC connector. 
Twisted Pair connections are always point-to-point connections. So, all nodes 
have to be connected to a customer-provided repeater or Ethernet switch, 
which is normally located outside the SP frame and is also connected to the 
control workstation. Consequently, using UTP involves much more cabling. 
On the other hand, fault isolation will be much easier with UTP than with 
thin-wire Ethernet, and there are more opportunities for performance 
improvements. Twisted Pair connections at 10 Mbps are called 10BASE-T, 
those operating at 100 Mbps are called 100BASE-TX. 

In order to use Twisted Pair in full duplex mode, there must be a native RJ-45 
TP connector at the node (no transceiver), and an Ethernet switch, such as 
the IBM 8274, must be used. A repeater always works in half duplex mode 
and will send all IP packets to all ports (such as in the 10BASE-2 LAN 
environment). We, therefore, recommend to always use an Ethernet switch 
with native UTP connections. 
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The P0WER3 SMP nodes (made available in 1999) have an integrated 
10/100 Mbps Ethernet adapter. They still may be connected and installed at 
10 Mbps using 10BASE-T or 10BASE-2 and a transceiver. However, to fully 
utilize the adapter at 100 Mbps, category 5 UTP wiring to a 100 Mbps 
repeater or Ethernet switch is required (100BASE-TX). As mentioned above, 
we recommend the use of an Ethernet switch since this allows to utilize the 
full duplex mode and avoids collisions. The control workstation also needs a 
fast connection to this Ethernet switch. 

3.3.1.2 SP LAN topologies 

The network topology for the SP LAN mainly depends on the size of the 
system and should be planned on an individual basis. We strongly 
recommend to provide additional network connectivity (through the SP Switch 
or additional Ethernet, Token Ring, FDDI, or ATM networks) if the 
applications on the SP perform significant communication among nodes. To 
avoid overloading the SP LAN by application traffic, it should be used only for 
SP node installations and system management, and applications should use 
these additional networks. 

In the following, only the SP LAN is considered. We show some typical 
network topologies, their advantages, and limitations. 

Shared 10BASE-2 network 

In relatively small SP configurations, such as single frame systems, the 
control workstation and nodes typically share a single thin-wire Ethernet. 
Figure 49 shows this setup. 
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Figure 49. Shared 10BASE-2 SP network 

This configuration is characterized by the following properties: 

• No routing is required since the CWS and all nodes share one subnet. 

• Consequently, the whole SP LAN is a single broadcast domain as well as 
a single collision domain. 

• The CWS acts as boot/install server for all nodes. 

• Performance is limited to one 10 Mbps HDX connection at a time. 

• Only six to eight network installs of SP nodes from the CWS NIM server 
can be performed simultaneously. 

Even if this performance limitation is accepted, this setup is limited by the 
maximum number of 30 stations on a 10BASE-2 segment. In practice, not 
more than 16 to 24 stations should be connected to a single 10BASE-2 
Ethernet segment. 

Segmented 10BASE-2 network 

A widely used approach to overcome the limitations of a single shared 
Ethernet is segmentation. The control workstation is equipped with additional 
Ethernet adapters, and each one is connected to a different shared 
10BASE-2 Ethernet subnet. This is shown in Figure 50. 
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Figure 50. Segmented 10BASE-2 SP network with two subnets 

For a configuration with N separate subnets (and consequently N Ethernet 
cards in the CWS), the following holds: 

• Nodes in one subnet need static routes to the (N-1) other subnets through 
the CWS, and routing (or IP forwarding) must be enabled on the CWS. 

• The SP LAN is split into N broadcast domains. 

• The CWS acts as boot/install server for all nodes since it is a member of 
all N subnets. 

• Aggregate performance is limited to a maximum of N times 10 Mbps HDX. 
However, this is only achievable if the CWS communicates with one node 
in each of the subnets simultaneously. 

• Only six to eight network installs per subnet should be performed 
simultaneously, thus increasing the maximum from 6N to 8N simultaneous 
installs. 

This approach is limited primarily by the number of available adapter slots in 
the control workstation but also by the ability of the CWS to simultaneously 
handle the traffic among these subnets or to serve 6N to 8N simultaneous 
network installations. In practice, more than four subnets should not be used. 

Segmented 10BASE-2 networks with Boot/Install servers 

For very large systems, where the above model of segmentation would 
require more 10 Mbps Ethernet adapters in the control workstation than 
possible, a more complex network setup can be deployed that uses additional 
boot/install servers. This is shown in Figure 51 on page 92. The CWS is 
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directly connected to only one Ethernet segment, which is attached to the 10 
Mbps Ethernet adapter enO of a set of N boot/install server (BIS) nodes, 
typically the first node in each frame. We call this Ethernet subnet the Install 
Ethernet since it is the network through which the CWS installs the 
boot/install server nodes. The remaining nodes are grouped into N additional 
Ethernet segments (typically one per frame), which are not directly connected 
to the CWS. Instead, each of these subnets is connected to one of the 
boot/install servers through a second 10 Mbps Ethernet adapter in the 
boot/install servers. 


Ethernet/2 



Figure 51. Segmented SP network with Boot/Install server hierarchy 

With such a network configuration: 

• Routing is complicated: 

• Non-BIS nodes in a segment have routes to all other segments through 
their BIS node. 

• BIS nodes have routes to the (N-1) other nodes’ segments through the 
BIS nodes attached to that segment. 

• The CWS has routes to the N nodes’ segments through the BIS nodes 
in these segments. 

• The SP LAN is split into (N+1) broadcast domains. 

• The boot/install servers are installed from the NIM server on the CWS. 
After this, all non-BIS nodes are installed by the boot/install servers. Note 
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that some NIM resources, such as the LPPSOURCE, are only served by 
the CWS. 


• The maximum bandwidth in the Install Ethernet segment (including the 
CWS) is 10 Mbps HDX. 

• Only six to eight BIS nodes can be installed simultaneously from the CWS 
in a first installation phase. In a second phase, each BIS node can install 
six to eight nodes in its segment simultaneously. 

Apart from the complex setup, this configuration suffers from several 
problems. Communication between regular nodes in different subnets 
requires routing through two boot/install server nodes. All this traffic, and all 
communications with the CWS (routed through one BIS node), have to 
compete for bandwidth on the single shared 10 Mbps half duplex Install 
Ethernet. 

The situation can be improved by adding a dedicated router. Connecting all 
the nodes’ segments to this router removes the routing traffic from the BIS 
nodes, and using a fast uplink connection to the CWS provides an alternative, 
a high bandwidth path to the CWS. The BIS nodes in each segment are still 
required because the network installation process requires that the NIM 
server and the client are in the same broadcast domain. Figure 52 on page 94 
shows such a configuration. Nodes in the frames now have a route to the 
control workstation and the other frames’ networks through the router, which 
off-loads network traffic from the BIS nodes. 
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Even when a router is added, the solution presented in the following section 
is normally preferable to a segmented network with boot/install servers both 
from a performance and from a management/complexity viewpoint. 

Switched 10BASE-2 network 

An emerging technology to overcome performance limitations in shared or 
segmented Ethernet networks is Ethernet switching, which is sometimes 
called micro-segmentation. An SP example is shown in Figure 53 on page 95. 
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Figure 53. Switched 10BASE-2 SP network with Fast Uplink 

This configuration has the following properties: 

• No routing is required. All Ethernet segments are transparently combined 
to one big LAN by the Ethernet switch. 

• Of course, node-to-node connections within a single Ethernet segment still 
have to share that 10-BASE-2 medium in half duplex mode. But many 
communications between different ports can be switched simultaneously 
by the Ethernet switch. The uplink to the control workstation can be 
operated in a 100 Mbps full duplex mode. 

• The control workstation can act as the boot/install server for all nodes 
since the Ethernet switch combines the CWS and nodes into one big 
network (or broadcast domain). 

This setup eliminates the routing overhead for communications between 
nodes or a node and the control workstation. With a 100 Mbps, full duplex 
Ethernet uplink to the CWS, there should also be no bottleneck in the 
connection to the CWS, at least if the number of 10BASE-2 segments is not 
much larger than ten. 
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Considering only the network topology, the control workstation should be able 
to install six to eight nodes in each Ethernet segment (port on the Ethernet 
switch) simultaneously since each Ethernet segment is a separate collision 
domain. Rather than the network bandwidth, the limiting factor most likely is 
the ability of the CWS itself to serve a very large number of NIM clients 
simultaneously, for example, answering UPD bootp requests or acting as the 
NFS server for the mksysb images. To quickly install a large SP system, it 
may still be useful to set up boot/install server nodes, but the network 
topology itself does not require boot/install servers. For an installation of all 
nodes of a large SP system, we advocate the following. 

1. Using the spbootins command, set up approximately as many boot/install 
server nodes as can be simultaneously installed from the CWS. 

2. Install the BIS nodes from the control workstation. 

3. Install the non-BIS nodes from their respective BIS nodes. This provides 
the desired scalability for the installation of a whole, large SP system. 

4. Using the spbootins command, change the non-BIS nodes’ configuration 
so that the CWS becomes their boot/install server. Do not forget to run 
setup_server to make these changes effective. 

5. Reinstall the original BIS nodes. This removes all previous NIM data from 
them since no other node is configured to use them as boot/install server. 

Using this scheme, the advantages of both a hierarchy of boot/install servers 
(scalable, fast installation of the whole SP system) and a flat network with 
only the CWS acting as a NIM server (less complexity, less disk space for BIS 
nodes) are combined. Future reinstallations of individual nodes (for example 
after a disk crash in the root volume group) can be served from the control 
workstation. Note that the CWS will be the only file collection server if the BIS 
nodes are removed, but this should not cause performance problems. 

The configuration shown in Figure 53 on page 95 scales well to about 128 
nodes. For larger systems, the fact that all the switched Ethernet segments 
form a single broadcast domain can cause network problems if operating 
system services or applications frequently issue broadcast messages. Such 
events may cause broadcast storms, which can overload the network. For 
example, Topology Services from the RS/6000 Cluster Technology use 
broadcast messages when the group leader sends PROCLAIM messages to 
attract new members. 
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— Note: ARP cache tuning - 

Be aware that for SP systems with very large networks (and/or routes to 
many external networks), the default AIX settings for the ARP cache size 
might not be adequate. The Address Resolution Protocol (ARP) is used to 
translate IP addresses to Media Access Control (MAC) addresses and vice 
versa. Insufficient ARP cache settings can severely degrade your network’s 
performance, in particular when many broadcast messages are sent. Refer 
to /usr/Ipp/ssp/README/ssp.css.README for more information about ARP 
cache tuning. 


In order to avoid problems with broadcast traffic, no more than 128 nodes 
should be connected to a single switched Ethernet subnet. Larger systems 
should be set up with a suitable number of switched subnets. To be able to 
network boot and install from the CWS, each of these switched LANs must 
have a dedicated connection to the control workstation. This can be 
accomplished either through multiple uplinks between one Ethernet switch 
and the CWS or through multiple switches each having a single uplink to the 
control workstation. 

Shared or switched 100BASE-TX network 

With the introduction of the POWER3 SMP nodes in 1999, it has become 
possible to operate nodes on the SP LAN at 100 Mbps including network 
installation. This requires UTP cabling as outlined in 3.3.1.1, “Frame and 
node cabling” on page 88. 

One possible configuration would be to use a repeater capable of sustaining 
100 Mbps and a fast Ethernet adapter in the control workstation. This would 
boost the available bandwidth up to 100 Mbps, but it would be shared among 
all stations, and connections are only half duplex. Although the bandwidth 
would be higher by a factor of ten compared to a 10BASE-2 SP Ethernet, we 
recommend to use an Ethernet switch that supports full duplex connections at 
100 Mbps instead of a repeater. Many node-to-node and node-to-CWS 
connections can be processed by the Ethernet switch simultaneously rather 
than the shared access through a repeater. This configuration is shown in 
Figure 54 on page 98. As discussed in the previous section, the limiting factor 
for the number of simultaneous network installations of nodes will probably be 
the processing power of the control workstation not the network bandwidth. 
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For larger SP configurations, the cabling required to establish point-to-point 
connections from all nodes to the Ethernet Switch can be impressive. An 
IBM 8274 Nways LAN RouteSwitch could be used to provide the required 
switching capacities. Models with 3, 5, or 9 switching modules are available. 

Heterogeneous 10/100 Mbps network 

In many cases, an existing SP system will be upgraded by new nodes that 
have fast Ethernet connections, but older or less lightly loaded nodes should 
continue to run with 10 Mbps SP LAN connections. A typical scenario with 
connections at both 10 Mbps and 100 Mbps is shown in Figure 55 on page 
99. 
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In this configuration, again an Ethernet Switch, such as the IBM 8274, is used 
to provide a single LAN and connects to the control workstation at 100 Mbps 
FDX. One frame has new nodes with 100 Mbps Ethernet. These nodes are 
individually cabled by 100BASE-TX Twisted Pair to ports of the Ethernet 
switch and operate in full duplex mode as in the previous example. Two 
frames with older nodes and 10BASE-2 cabling are connected to ports of the 
same Ethernet switch using media converters as in the configuration shown 
in Figure 53 on page 95. Ideally, a switching module with autosensing ports is 
used, which automatically detects the communication speed. 

3.3.2 Additional LANs 

The SP Ethernet can provide a means to connect all nodes and the control 
workstation to your site networks. However, it is likely that you will want to 
connect your SP nodes to site networks through other network interfaces. If 
the SP Ethernet is used for other networking purposes, the amount of 
external traffic must be limited. If too much traffic is generated on the SP 
Ethernet, the administration of the SP nodes might be severely impacted. For 
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example, problems might occur with network installs, diagnostic functions, 
and maintenance mode access. 

Ethernet, Fiber Distributed Data Interface (FDDI), and token-ring are also 
configured by the SP. Other network adapters must be configured manually. 
These connections can provide increased network performance in user file 
serving and other network related functions. You need to assign all the 
addresses and names associated with these additional networks. 

3.3.3 IP over the switch 

If your SP has a switch, and you want to use IP for communications over the 
switch, each node needs to have an IP address and name assigned to its 
switch interface, the cssO adapter. If hosts outside the SP Switch network 
need to communicate over the switch using IP with nodes in the SP system, 
those hosts must have a route to the switch network through one of the SP 
nodes or through the SP Switch router. 

If you are not enabling ARP on the switch, specify the switch network subnet 
mask and the starting node’s IP address. After the first address is selected, 
subsequent node addresses are based on the switch port number assigned. 
Unlike all other network interfaces, which can have sets of nodes divided into 
several different subnets, the switch IP network must be one contiguous 
subnet that includes all the nodes in the system partition. 

If you want to assign your switch IP addresses as you do your other adapters, 
you must enable ARP for the cssO adapter. If you enable ARP for the cssO 
adapter, you can use whatever IP addresses you wish, and those IP 
addresses do not have to be in the same subnet for the whole system. 

3.3.4 Subnetting considerations 

All but the simplest SP system configurations will likely include several 
subnets. Thoughtful use of netmasks in planning your networks can 
economize on the use of network addresses. 

As an example, consider an SP Ethernet where none of the six subnets 
making up the SP Ethernet have more than 16 nodes on them. A netmask of 
255.255.255.224 provides 30 discrete addresses per subnet. Using 
255.255.255.224 as a netmask, we can then allocate the address ranges as 
follows: 

• 192.168.3.1-31 to the control workstation to node 1 subnet 

• 192.168.3.33-63 to the frame 1 subnet 
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• 192.168.3.65-96 to frame 2 

For example, if we used 255.255.255.0 as our netmask, then we would have 
to use four separate Class C network addresses to satisfy the same wiring 
configuration (that is, 192.168.3.x, 192.168.4.x, 192.168.5.x, and 
192.168.6.x). An example of SP Ethernet subnetting is shown in Figure 56. 

Consider the example of a multi-frame SP that has a CWS with separate 
Ethernet connections to the node in the first slot in each frame. Each first 
node has a network that connects to every other node in that frame. 



Subnet 4 (16 connected) 


Figure 56. SP Ethernet subnetting example 


3.4 Routing considerations 

When planning routers, especially router nodes, in your system, several 
factors can help determine the number of routers needed and their placement 
in the SP configuration. The number of routers you need can very depending 
on your network type (in some environments, router nodes might also be 
called gateway nodes). 

For nodes that use Ethernet or Token-Ring as the routed network, CPU 
utilization may not be a big problem. For nodes that use FDDI as the 
customer routed network, a customer network running at or near maximum 
bandwidth results in high CPU utilization on the router node. Applications, 
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such as POE and the Resource Manager, should run on nodes other than 
FDDI routers. However, Ethernet and Token Ring gateways can run with 
these applications. 

For systems that use Ethernet or Token Ring routers, traffic can be routed 
through the SP Ethernet. For FDDI networks, traffic should be routed across 
the switch to the destination nodes. The amount of traffic coming in through 
the FDDI network can be up to ten times the bandwidth that the SP Ethernet 
can handle. 

For bigger demands on routing and bandwidth, the SP Switch router can be a 
real benefit. Refer to 2.5.1, “SP Switch Router” on page 26 for details. 


3.5 Related documentation 

The following documentations will help you understand the concepts and 
examples covered in this guide. Refer to the documentations mentioned in 
this chapter to maximize your chances of success in the SP certification 
exam. 

SP manuals 

RS/6000: Planning Volume 2, GA22-7281. This book is essential to 
understand the planning and requirements of SP system networking. 

RS/6000 SP: Planning Vol 1, Hardware and Physical Environment , 
GA22-7280, Chapter 15. This chapter will help you to understand the 
SP-attached Server. 

RS/6000 SP Overview, Planning and Installation Course AU91. This course 
material is easy to follow and will help you to understand your networking 
configuration. 

SP redbooks 

Inside the RS/6000 SP, SG24-5145. This book will help you to understand 
how the RS/6000 SP is affected by the network. 

Others 

IBM Certification Study Guide: AIX V.4.3 System Support, SG24-5139. This 
book helps to understand some part of the SP system that relates closely to 
networking design. 

TCP/IP, SNA, HACMP, and Multiple Systems, SG24-4653. This redbook 
contains in-depth discussion on protocols and will help you to strengthen your 
knowledge in this area. 
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3.6 Sample questions 

This section provides a series of questions to help aid you in preparation for 
the certification exam. The answers to these questions can be found in 
Appendix A. 

1. The SP requires an Ethernet connection between the control workstation 
and all nodes. Which of the following tasks do NOT use the SP Ethernet? 

A. Network installation 

B. System management 

C. Event monitoring 

D. Hardware control 

2. Setting up host name resolution is essential to all the PSSP components. 
The name associated to the enO interface is know as: 

A. Initial hostname 

B. Reliable hostname 

C. Hostname 

D. Primary name 

3. What is the default order for resolving host names if /etc/resolv.conf is 
present? 

A. /etc/hosts - DNS - NIS 

B. DNS - NIS - /etc/hosts 

C. NIS - DNS -/etc/hosts 

D. NIS - /etc/hosts - DNS 

4. In a possible scenario with a segmented 10Base-2 network, the control 
workstation is equipped with additional Ethernet adapters. Nodes in each 
separate segment will need: 

A. A boot/install server for that segment 

B. A route to the control workstation 

C. A default route set to one of the nodes or a router on that segment 

D. All the above 

5. Consider an SP Ethernet where none of the six subnets making up the SP 
Ethernet have more than 16 nodes on them. How many discrete 
addresses per subnet does a netmask of 255.255.255.224 provide? 

A.16 
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B. 32 

C. 30 

D. 8 

6. Which network adapter must be manually configured? 

A. Ethernet 

B. FDDI 

C. ATM 

D. Token Ring 

7. The default order to resolve host names can be overwritten by creating a 
configuration file and specifying the desired order. Which of the following 
is the correct location and name of the configuration file? 

A. /etc/netservice.conf 

B. /netservice.conf 

C. /etc/netsvc.conf 

D. netsvc.conf 

8. Which of the following daemons is NOT used by NIS? 

A. ypserv 

B. ypbind 

C. ypupdated 

D. yppassword 

9. Which of the following statements is a characteristic of a NIS slave server? 

A. Backs up other slave servers 

B. Balances the load if the primary slave server is busy 

C. Enables NIS requests if there are different networks in the NIS domain 

D. Disables NIS request if there are different networks in the NIS domain 

10. Which of the following files is NOT served by NIS? 

A. /etc/rpc 

B. /etc/publickey 

C. /etc/networks 

D. /etc/ethernets 
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3.7 Exercises 

Here are some exercises you may wish to perform: 

1. On a test system that does not affect any users, practice setting up new 
static routes using the command line. 

2. Which commands can be used to configure the SP Ethernet for the nodes 
in the SDR? (Refer to the study guide test environment on page 3 for this 
exercise.) 

3. Which netmask can be used for the study test guide environment on page 
3? What happens to the netmask if we add a third Ethernet segment to the 
environment? 

4. On a test system that does not affect any users, use the environment 
variable nsorder to change the default order to resolve host names. 

5. On a test system that does not affect any users, configure NIS. 
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Chapter 4. I/O devices and file systems 


This chapter provides an overview of internal and external I/O devices and 
how they are supported in RS/6000 SP environments. It also covers a 
discussion on file systems and their utilization in the RS/6000 SP. 


4.1 Key concepts you should study 

Before taking the certification exam, make sure you understand the following 
concepts: 

• Support for external I/O devices. 

• Possible connections of I/O devices, such as SCSI, RAID, and SSA. 

• Network File System (NFS). How it works, and how it is utilized in the 
RS/6000 SP especially for installation. 

• Basic understanding of AFS and DFS file systems and their potential in 
RS/6000 SP environments. 


4.2 I/O devices 

Anything that is not memory or CPU can be consider an Input/Output device 
(I/O device). I/O devices include internal and external storage devices as well 
as communications devices, such as network adapters, and in general, any 
devices that can be used for moving data. 

4.2.1 External disk storage 

If external disk storage is part of your system solution, you need to decide 
which of the external disk subsystems available for the SP best satisfy your 
needs. 

Disk options offer the following trade-offs in price, performance, and 
availability: 

• For availability, you can use either a RAID subsystem with RAID 1 or RAID 
5 support, or you can use mirroring. 

• For best performance when availability is needed, you can use mirroring 
or RAID 1, but these require twice the disk space. 

• For low cost and availability, you can use RAID 5, but there is a 
performance penalty for write operations, One write requires four I/Os: A 
read and a write to two separate disks in the RAID array. An N+P (parity) 
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RAID 5 array, comprised of N+1 disks, offers N disks worth of storage; 
therefore, it does not require twice as much disk space. 

Also, use of RAID 5 arrays and hot spares affect the relationship between 
raw storage and available and protected storage. RAID 5 arrays, 
designated in the general case as N+P arrays, provide N disks worth of 
storage. For example, an array of eight disks is a 7+P RAID 5 array 
providing seven disks worth of available protected storage. A hot spare 
provides no additional usable storage but provides a disk that quickly 
replaces a failed disk in the RAID 5 array. All disks in a RAID 5 array 
should be the same size; otherwise, disk space will be wasted. 

• For low cost when availability due to disk failure is not an issue, you can 
use what is known as JBOD (Just a Bunch of Disk). 

After you choose a disk option, be sure to get enough disk drives to satisfy 
the I/O requirements of your application, taking into account if you are using 
the Recoverable Virtual Shared Disk optional component of PSSP, mirroring, 
or RAID 5 and whether I/O is random or sequential. 

Table 4 has more information on disk storage choices. 


Table 4. Disk Storage Subsystems 


Disk 

Storage 

Description 

2100 

The Versatile Storage Server (VSS) offers the ability to share disks with 
up to 64 hosts through Ultra SCSI connections. The hosts can be 
RS/6000, NT, AS/400, and other UNIX platforms. The VSS has a 
protected storage capacity of up to 2 TB. It can be connected through 
multiple Ultra SCSI busses (up to 16) for increased throughput and has 
up to 6 GB of read cache. Internally, SSA disks are configured in RAID 5 
arrays with fast write cache availability. The 7133 is an integral part of 
VSS. Your existing 7133 SSA disks can be placed under control of the 
VSS. They can remain in their current racks, or they can be placed in the 
VSS enclosures. 

Disks are configured into 6+P+S or 7+P RAID 5 arrays with at least one 
hot spare per loop and typically one 7133 drawer per SSA loop. These 
RAID 5 arrays are then divided into LUNs (logical units) with valid LUN 
sizes of 0.5, 1,2, 4, 8, 12, 16, 20, 24, 28, and 32 GB. Each LUN is an 
hdisk in the RS/6000. 
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Disk 

Storage 

Description 

7027 

The 7027 High Capacity Storage Drawer provides up to a maximum of 
67.5 GB of disk storage plus three tapes or CD-ROM bays all in a single 
rack drawer. Supporting SCSI-2 Fast/Wide single-ended and SCSI-2 
Fast/Wide differential, the 7207 can attach to Micro Channel-based 
RS/6000 systems. Offering hot-swap disk and remote power-on 
capabilities, it offers exceptional performance in storage expansion and 
growth. 

7131 

The tower has five hot swappable slots for 4.5, or 9.1 GB disk drives for 
a maximum 45.5 GB capacity. Two towers can provide a low cost 
mirrored solution. 

7133 

If you require high performance, the 7133 Serial Storage Architecture 
(SSA) Disk might be the subsystem for you. SSA provides better 
interconnect performance than SCSI and offers hot pluggable drives, 
cables, and redundant power supplies. RAID 5, including hot spares, is 
support on some adapters, and loop cabling provides redundant data 
paths to the disk. Two loops of up to 48 disks are supported on each 
adapter. However, for best performance of randomly accessed drives, 
you should have only 16 drives (one drawer or 7133) in a loop. 

7137 

The 7137 subsystem supports both RAID 0 and RAID 5 modes. It can 
hold from 4 to 33 GB of data (29 GB maximum in RAID 5 mode). The 
7137 is the low end model of RAID support. Connection is through SCSI 
adapters. If performance is not critical, but reliability and low cost are 
important, this is a good choice 


In summary, to determine what configuration best suits your needs, you must 
be prepared with the following information: 

• The amount of storage space you need for your data. 

• A protection strategy (mirroring, RAID 5), if any. 

• The I/O rate you require for storage performance. 

• Any other requirements, such as multi-hosts connections, or if you plan to 
use the Recoverable Virtual Shared Disk component of PSSP, which 
needs twin-tailed disks. 

You can find up-to-date information about the available storage subsystems 
On the Internet at: http://www.storage.ibm.com 

Figure 57 on page 110 shows external devices configuration that can be 
connected to an SP system. 
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Figure 57. External devices 


4.2.2 Internal I/O adapters 

There are two types of internal I/O adapters you could use in your 
configuration depending on the types of nodes you have: PCI adapters and 
MCA adapters. 

4.2.2.1 PCI adapters 

This section provides information on RS/6000 SP system PCI adapters. The 
following features are installed in the SP nodes and are used to connect the 
SP system with external networks. Network connections through SP nodes 
are typically slower than network connections through an SP Switch router. 
Also, network connections through a node may not have the availability of 
those through an SP Switch router. 


Table 5 has more information on available PCI adapters. 

Table 5. Available PCI adapter features 


Feature 

Code 

PCI Adapter Name 

2741 

FDDI SK-NET LP SAS 

2742 

FDDI SK-NET LP DAS 

2743 

FDDI SK-NET UP DAS Adapter 
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Feature 

Code 

PCI Adapter Name 

2751 

S/390 ESCON Channel Adapter 

2920 

Token-Ring Auto LANstreamer Adapter 

2943 

EIA 232/RS-422 8-port Asynchronous Adapter 

2944 

WAN EIA-RS232 128-port Asynchronous Adapter 

2847 

ARTIC960Hx 4-port Selectable Adapter 

2962 

2-port Multiprotocol X.25 Adapter 

2963 

ATM 155 TURBOWAYS UTP Adapter 

2968 

Ethernet 10/100 lOBaseTX Adapter 

2969 

Gigabit Ethernet - SX Adapter 

2985 

Ethernet 10Base2 and lOBaseT (BNC/RJ-45) LAN Adapter 

2987 

Ethernet 10Base5 and lOBaseT (AUI/RJ-45) LAN Adapter 

2988 

TURBOWAYS 155 ATM Adapter 

4959 

High-Speed Token Ring Adapter 

6204 

SCSI-2 Ultra/Wide DE Adapter 

6205 

Dual Channel Ultra2 SCSI Adapter 

6206 

SCSI-2 Ultra/Wide SE Adapter 

6208 

SCSI-2 F/W Single-Ended Adapter 

6209 

SCSI-2 F/W Differential Adapter 

6215 

SSA RAID 5 Adapter 

6222 

SSA Fast-Write Cache Option 

6227 

Gigabit Fibre Channel Adapter 

6230 

Advanced SerialRAID Plus Adapter 

6231 

128 MB DIMM Option Card 

6235 

32 MB Fast-Write Cache Option Card 

6310 

ARTIC960RxD Quad Digital Trunk Adapter 

6311 

ARTIC960RxF Digital Trunk Resource Adapter 
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4.2.2.2 MCA adapters 

This section provides information on RS/6000 SP system MCA adapters. 

Table 6 has more information on available PCI adapters. 

Table 6. Available MCA adapter features 


Feature 

Code 

Adapter Description 

2402 

IBM Network Terminal Accelerator - 256 Session 

2403 

IBM Network Terminal Accelerator - 2048 Session 

2410 

SCSI-2 High Performance External I/O Controller 

2412 

Enhanced SCSI-2 Differential Fast/Wide Adapter/A 

2415 

SCSI-2 Fast/Wide Adapter/A 

2416 

SCSI-2 Differential Fast/Wide Adapter/A 

2420 

SCSI-2 Differential High Performance External I/O Controller 

2700 

4-port Multiprotocol Communications Controller 

2723 

FDDI Dual-Ring Attachment 

2724 

FDDI Single-Ring Attachment 

2735 

High Performance Parallel Interface - HIPPI 

2754 

S/390 ESCON Channel Emulator Adapter 

2755 

Block Multiplexer Channel Adapter - BMCA 

2756 

ESCON Control Unit Adapter 

2930 

8-port Async Adapter-EIA-232 

2940 

8-port Async Adapter-EIA-422A 

2960 

X-25 Interface Co-Processor/2 

2970 

Token Ring High Performance Network Adapter 

2972 

Auto Token Ring LANstreamer MC 32 Adapter 

2980 

Ethernet High Performance LAN Adapter 

2984 

TURBOWAYS 100 ATM Adapter 

2989 

TURBOWAYS 150 ATM Adapter 

2992 

High-Performance Ethernet LAN Adapter (AUl/IObaseT) 
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Feature 

Code 

Adapter Description 

2993 

High-Performance Ethernet LAN Adapter (BNC) 

2994 

10/100 Ethernet Twisted Pair MC Adapter 

4224 

Ethernet lOBaseT Transceiver 

6212 

9333 High Performance Subsystem Adapter 

6214 

SSA 4-port Adapter 

6216 

Enhanced SSA 4-port Adapter 

6217 

SSA 4-port RAID Adapter 

6219 

Micro Channel SSA Multi-initiator/RAID EL Adapter 

6305 

Digital Trunk Dual Adapter 

7006 

Real-time Interface Co-Processor Portmaster Adapter/A 

8128 

128-port Async Controller 


4.3 Multiple rootvg support 

The concept called Multiple Rootvg or Alternate Root Volume Group 
provides the ability to boot a separate volume group on a node. To do this, 
a new SDR class called Volume_Group has been created in PSSP 3.1 to 
store the data. These additional volume groups allow booting of a 
separate version of the operating system on the node. Obviously, before 
using this alternative, you must do as many installations as you need. 
Each installation uses a different Volume_Group name created at the SDR 
level. 

Although the name of these volume groups must be different in the SDR 
because they are different objects in the same class (the first one can be 
rootvg and the following othervg, for example), this name stays in the SDR 
and is not used directly by NIM to install the node. Only the attribute 
Destination Disks is used to create the rootvg node volume group. 

If your node has two (or more) available rootvgs, only one is used to boot: 
It is determined by the bootlist of the node. Because the user determines 
which version of the operating system to boot, another concept appears 
with PSSP 3.1: with the possibility to change the bootlist of a node directly 
from the CWS by using the new spbootiist command. 
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Another enhancement in PSSP 3.1 is the possibility of mirroring the Root 
Volume Group directly from the CWS. Mirroring is writing simultaneous 
copies of the operating systems logical volumes to provide redundancy. 
Either two or three copies (one or two mirrors) are allowed in AIX. 

The operating system determines which copy of each operating systems 
logical volume is active based on availability. 

Prior to PSSP 3.1, the RS/6000 SP attributes, such as operating system 
level, PSSP level, installation time, and date, were associated with the 
Node object in the SDR. 

In PSSP 3.1, or later, these attributes are more correctly associated with a 
volume group. A node is not at AIX 4.3.2, for example; a volume group of 
the node is at AIX 4.3.2. To display this information, a new option (-v) has 
been added in the spistdata command. 

Therefore, part of this feature is to break the connection between nodes 
and attributes more properly belonging to a volume group. For this reason, 
some information has been moved from the SMIT panel Boot/lnstall 
Server Information to the Create Volume Group Information or the Change 
Volume Group Information panel. 

We now describe these features and the related commands in more detail. 

4.3.1 The Volume_Group class 

As explained, a new Volume_Group class has been created in PSSP 3.1. 
The following is a list of attributes: 

• node_number 

• vg_name (volume group name) 

• pvjist (one or more physical volumes) 

• quorum (quorum is true or false) 

• copies (1,2, or 3) 

• installjmage (name of the mksysb) 

• code_version (PSSP level) 

• lppsource_name (which Ippsource) 

• boot_server (which node serves this volume group) 

• last_install_time (time of last install of this volume group) 

• last_install_image (last mksysb installed on this volume group) 

• last_bootdisk (which physical volume to boot from) 
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The attributes pvjist, installjmage, code_version, lppsource_name, and 
boot_server have been duplicated from the Node class to the 
Volume_Group class. New SMIT panels associated with these changes 
are detailed in the following sections. 

4.3.1.1 The node object 

The new Volume_Group class uses some attributes from the old node 
class. The following list describes the changes made to the Node object: 

• A new attribute is created: selected_vg 

• selected_vg points to the current Volume_Group object. 

• The node object retains all attributes. 

• Now the node attributes common to the Volume_Group object reflect 
the current volume group of the node. 

• The Volume_Group objects associated with a node reflect all the 
possible volume group states of the node. 

-Note- 

All applications using the node object remain unchanged with the exception 
of some SP installation code. 


4.3.1.2 Volume_Group default values 

When the SDR is initialized, a Volume_Group object for every node is 
created. 

By default, the vg_name attribute of the Volume_Group object is set to 
rootvg, and the selected_vg of the node object is set to rootvg. 

The following are the other default values: 

• The default install_disk is hdiskO. 

• Quorum is true. 

• Mirroring is off; copies are set to 1. 

• There are no bootable alternate root volume groups. 

• All other attributes of the Volume_Group are initialized according to the 
same rules as the node object. 

4.3.2 Volume group management commands 

After describing the new volume group management features available in 
PSSP 3.1 or later, let us now describe the commands used to create, change, 
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delete, mirror, and unmirror Volume_Group objects. Also, changes to existing 
commands in previous PSSP version (previous to PSSP 3.1) are described. 

4.3.2.1 spmkvgobj 

All information needed by NIM, such as Ippsource, physical disk, server, 
mksysb, and so forth, is now moved from Boot/Install server Information to 
a new panel accessible by the fast path createvg_diaiog as shown in 
Figure 58. 


Create Volume Group 

Information 

Type or select values in entry fields. 

Press Enter AFTER making all desired changes. 


Start Frame 

Start Slot 

Node Count 

[Entry Fields] 

[] # 

[] # 

[] # 

OR 


Node List 

[10] 

Volume Group Name 

Physical Volume List 

Number of Copies of Volume Group 
Boot/Install Server Node 

Network Install Image Name 

LPP Source Name 

PSSP Code Version 

Set Quorum on the Node 

[rootvg] 

[hdiskO, hdiskl] 

1 + 

[0] # 

[bos.obj.mksysb.aix432.090898] 

[aix432] 

PSSP-3.1 + 

+ 

Fl=Help F2=Refresh 

F5=Reset F6=Command 

F9=Shell FlO=Exit 

F3=Cancel F4=List 

F7=Edit F8=Image 

Enter=Do 


Figure 58. New SMIT panel to create a volume group 

The associated command of this SMIT panel is spmkvgobj, whose options 
are: 

-r vg_name 
-1 node_list 
-h pv_list 
-i install_image 
-v lppsource_name 
-p code_version 
-n boot_server 
-q quorum 
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-c copies 

The following command built by the previous SMIT panel is a good 
example of the use of spmkvgobj: 

/usr/lpp/ssp/bin/spmkvgobj -1 '10' -r 'rootvg' -h 'hdiskO,hdiskl' -n 
'O' -i 'bos. obj.mksysb.aix432.090898' -v 'aix432' -p 'PSSP-3.1' 

Here is more information about the -h option: For PSSP levels prior to 
PSSP 3.1, two formats were supported to specify the SCSI disk drive and 
are always usable: 

• Hardware location format 

00-00-00-0,0 to specify a single SCSI disk drive, or 
00-00-00-0,0:00-00-00-1,0 to specify multiple hardware locations (in 
that case, the colon is the separator). 

• Device name format 

hdiskO to specify a single SCSI disk drive, or hdiskO, hdiskl to specify 
multiple hardware locations (in that case, the comma is the separator). 

You must not use this format when specifying an external disk because 
the relative location of hdisks can change depending on what hardware 
is currently installed. It is possible to overwrite valuable data by 
accident. 

A third format is now supported to be able to boot on SSA external disks, 
which is a combination of the parent and connwhere attributes for SSA 
disks from the Object Data Management (ODM) CuDv. In the case of SSA 
disks, the parent always equals ssar. The connwhere value is the 
15-character unique serial number of the SSA drive (the last three digits 
are always 00D for a disk). This value is appended as a suffix to the last 
12 digits of the disk ID stamped on the side of the drive. If the disk drive 
has already been defined, the unique identity may be determined using 
SMIT panels or by following these two steps: 

• Issue the command: 

lsdev -Ccpdisk -r connwhere 

• Select the 15-character unique identifier whose characters 5 to 12 
match those on the front of the disk drive. 

For example, to specify the parent-connwhere attribute, you can enter: 

ssar//0123456789AB00D 

Or, to specify multiple disks, separate using colons as follows: 

ssar//0123456789AB00D:ssar//0123456789FG00D 
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— Important- 

The ssar identifier must have a length of 21 characters. 

Installation on external SSA disks is supported in PSSP 3.1 or later. 


4.3.2.2 spchvgobj 

After a Volume_Group has been created by the spmkvgobj command, you 
may want to change some information. Use the spchvgobj command or the 
new SMIT panel (fastpath is changevg_dialog) shown in Figure 59. 

This command uses the same options as the spmkvgobj command. The 
following is an example built by the SMIT panel: 

/usr/lpp/ssp/bin/spchvgobj -1 '1' -r 'rootvg' -h 
'hdiskO,hdiskl,hdisk2' -c '2' -p 'PSSP-3.1' 


Change Volume Group 

Information 


Type or select values in entry fields. 

Press Enter AFTER making all desired changes. 



Start Frame 

Start Slot 

Node Count 

[Entry Fields] 

[] # 

[] # 

[] # 


OR 



Node List 

[1] 


Volume Group Name 

Physical Volume List 

Number of Copies of Volume Group 

Set Quorum on the Node + 

Boot/Install Server Node 

Network Install Image Name 

LPP Source Name 

PSSP Code Version 

[rootvg] 

[hdiskO,hdiskl,hdisk2] 

2 + 

[] # 

[] 

[] 

PSSP-3.1 + 


Fl=Help F2=Refresh 

F5=Reset F6=Command 

F9=Shell F10=Exit 

F3=Cancel 

F7=Edit 

Enter=Do 

F4=List 

F8=Image 


Figure 59. New SMIT panel to modify a volume group 
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Note 


To verify the content of the Volume_Group class of node 1, you can issue 
the following SDR command: 

SDRGetObjects Volume_Group node_number==l vg_name pv_list copies 


4.3.2.3 sprmvgobj 

To be able to manage the Volume_Group class, a third command to 
remove a Volume_Group object that is not the current one has been 
added: sprmvgobj 

This command accepts the following options: 

-r vg_name 
-1 node_list 

Regarding SMIT, the Delete Database Information SMIT panel has been 
changed to access the new SMIT panel named Delete Volume Group 
Information (the fastpath is deietevg_diaiog). 

Refer to Figure 60 for details. 



Delete Volume Group Information 


Type or select values in entry fields. 




Press Enter AFTER making all desired changes. 






[Entry Fields] 


Start Frame 



[] 

# 

Start Slot 



[] 

# 

Node Count 



[] 

# 

OR 





Node List 



[1] 


Volume Group Name 



[rootvg2] 


Fl=Help 

F2=Refresh 

F3=Cancel 

F4=List 


F5=Reset 

F6 =Cammand 

F7=Edit 

F8=Image 


F9=Shell 

F10=Exit 

Enter=Do 




Figure 60. New SMIT panel to delete a volume group 

The following is an example built by the SMIT panel used in Figure 60: 

/usr/lpp/ssp/bin/sprmvgobj -1 '1' -r 'rootvg2' 
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4.3.2.4 Changes to spbootins in PSSP 3.1 or later 

The spbootins command sets various node attributes in the SDR 
(code_version, lppsource_name, and so forth). 

By using the spbootins command, you can select a volume group from all 
the possible volume groups for the node in the Volume_Group class. 

Attributes shared between the node and Volume_Group objects are 
changed using a new set of Volume_Group commands, not by using 

spbootins. 

The new spbootins is as follows: 

spbootins 

-r cinstall|diag|maintenance|migrate|disk|customize> 

-1 <node_list> 

-c <selected_vg> 

-s <yes|no> 

spbootins no longer has the following flags: 

-h <install_disk> 

-n <boot_server> 

-v <lppsource_name> 

-i <install_image_name> 

-p <PSSP_level> 

-u <usr_server_id> 

-g <usr_gateway_id> 

-a <interface name> 

- Note - 

-u, -g, and -a flags were dropped because PSSP 3.1 no longer 
supports /usr servers. 

Figure 61 on page 121 shows the new SMIT panel to issue spbootins (the 
fastpath is server_dialog). 
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Boot/Install Server 

Information 


Type or select values in entry fields. 

Press Enter AFTER making all desired changes 



Start Frame 
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[Entry Fields] 

[] # 

[] # 
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Node List 
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Figure 61. New SMIT panel to issue the spbootins command 


You get the same result by issuing the following from the command line: 

spbootins -1 10 -r install -c rootvg -s yes 

Note that the value yes is the default for the -s option; in this case, the 
script setup_server is run automatically. 

4.3.2.5 spmirrorvg 

This command enables mirroring on a set of nodes given by the option 

-1 node_list 

You can force (or not force) the extension of the volume group by using 
the -f option (available values are: yes or no). 

This command takes the volume group information from the SDR updated 
by the last spchvgobj and spbootins commands. 

Note: 

You can add a new physical volume to the node rootvg by using the 
spmirrorvg command; the following steps give the details: 

• Add a physical disk to the actual rootvg in the SDR by using 
spchvgobj without changing the number of copies. 

• Run spmirrovg 

Figure 62 on page 122 shows the new SMIT panel to issue spmirrorvg (the 
fastpath is start_mirroring). 
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Initiate Mirroring on a 

Node 


Type or select values in entry fields. 

Press Enter AFTER making all desired changes. 

[Entry Fields] 


Start Frame 

[] 

# 

Start Slot 

[] 

# 

Node Count 

[] 

# 

OR 



Node List 

[1] 


Force Extending the Volume Group? 

no 

+ 

Fl=Help F2=Refresh F3=Cancel 

F4=List 


F5=Reset F6=Cammand F7=Edit 

F9=Shell F10=Exit Enter=Do 

F8=Image 



Figure 62. New SMIT panel to initiate the spmirrorvg command 

The following is an example built by the SMIT panel in Figure 62: 

/usr/lpp/ssp/bin/spmirrorvg -1 '1'' 

For more detail regarding the implementation of mirroring root volume 
groups, refer to Appendix B of the manual PSSP: Administration Guide , 
SA22-7348. 


- Note - 

This command uses the dsh command to run the AlX-related commands 
on the nodes. 


4.3.2.6 spunmirrorvg 

This command disables mirroring on a set of nodes given by the option: 

-1 node_list 

Figure 63 shows the new SMIT panel to issue spunmirrorvg (the fastpath is 

stop_mirroring). 
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Figure 63. New SMIT panel to initiate the spunmirrorvg command 

The following is the example built by the SMIT panel in Figure 63: 

/usr/lpp/ssp/bin/spunmirrorvg -1 '1'' 


- Note- 

This command uses the dsh command to run the AIX related commands 
on the nodes. 


4.3.2.7 Changes to splstdata in PSSP 3.1 or later 

spistdata can now display information about Volume_Groups using the 
new option: 

-V 

Figure 64 shows the information related to node 1 in the result of the 
command: spistdata -v -l l 
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node# 

name 

last install 
pv list 

List Volume Group Information 

boot server quorum copies code version lppsource name 

image last install time last bootdisk 

1 

rootvg 

0 

true 1 

PSSP-3.1 

aix432 


default 


Thu_S ep_2 4_16:47:50_ 

EDT_1998 

hdiskO 


hdiskO 





1 

rootvg2 

0 

true 1 

PSSP-3.1 

aix432 


default 


Fri_Sep_2 5_0 9:16:44_ 

EDT_1998 

hdisk3 


ssar//0004AC50532100D:ssar//0004AC50616A00D 



1 

jmbvg 

0 

true 1 

PSSP-3.1 

aix432 


default 


Fri_Sep_2 5_11:50:47 

EDTJL998 

hdiskO 


ssar//0004AC5150BA00D 





Figure 64. Example of splstdata -v 

4.3.2.8 spbootlist 

spbootiist sets the bootlist on a set of nodes by using the option: 

-1 node_list 

This command takes the volume group information from the SDR updated 
by the last spchvgobj and spbootins commands. 

Section 4.3, “Multiple rootvg support” on page 113 gives information on 
how to use this new command. 

4.3.3 How to declare a new rootvg 

Several steps must be done in the right order; they are the same as for an 
installation. The only difference is that you must enter an unused volume 
group name. 

The related SMIT panel or commands are given in Figure 58 on page 116 and 
Figure 61 on page 121. 

At this point, the new volume group is declared, but it is not usable. You must 
now install it using a Network Boot, for example. 

4.3.3.1 How to activate a new rootvg 

Several rootvgs are available on your node. To activate one of them, the 
bootlist has to be changed by using the spbootiist command or the related 
SMIT panel (the fastpath is bootiist_diaiog) as shown in Figure 65 on page 
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126. Because the spbootiist command takes information from the node boot 
information given by spistdata -b, this information has to be changed by 
issuing the spbootins command. Once the change is effective, you can issue 
the spbootiist command. 

Verify your node bootlist by issuing the command: 

dsh -w <node> 'bootlist -m normal -o' 

Then, reboot the node. 

The following example gives the steps to follow to activate a new rootvg on 
node 1 (hostname is nodeOI). We assume two volume groups (rootvgl, and 
rootvg2) have already been installed on the node, rootvgl is the active 
rootvg. 

1. Change the node boot information: 

spbootins -1 1 -c rootvg2 -s no 

2. Note, it is not necessary to run setup_server. 

3. Verify: 

spistdata -b 

4. Change the node bootlist: 

spbootiist -1 1 

5. Verify: 

dsh -w nodeOI 'bootlist -m normal -o' 

6. Reboot the node: 

dsh -w nodeOI 'shutdown -Fr' 

- Important - 

The key switch must be in the normal position. 
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Type or select 

values in entry 

fields. 




Press Enter AFTER making all desired changes. 
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u 
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n 

# 


Node Count 



n 
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F8=Image 

F9=Shell 

FlO=Exit 
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Figure 65. SMIT panel for the spbootlist command 


4.3.4 Booting from external disks 

Support has been included in PSSP 3.1 for booting an SP node from an 
external disk. The disk subsystem can be either external Serial Storage 
Architecture (SSA) or external Small Computer Systems Interface (SCSI). 
The option to have an SP node without an internal disk storage device is now 
supported. 

4.3.4.1 SSA disk requirements 

Figure 66 and Figure 67 on page 127 show the SSA disk connections to a 
node. 
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Figure 66. Cabling SSA disks to RS/6000 SP nodes 



Figure 67. Connections on the SSA disks 
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Not all node types can support an SSA boot. Table 7 shows the node types 
that support an SSA boot. 

Table 7. Supported adapters for nodes with full SSA boot 


Node Feature 
Code 

Node Type 

Feature Code Numbers of Supported SSA 
Adapters 

2053 

POWER3 Wide 

F/C 6225, 6230 

2052 

POWER3 Thin 

F/C 6225, 6230 

2051 

332 MHz Wide 

F/C 6225, 6230 

2050 

332 MHz Thin 

F/C 6225, 6230 

Reference information for withdrawn nodes 

2005 

77 MHz Wide 

F/C 6214, 6216, 6217,6219 

2006 

604 High 

F/C 6214, 6216, 6217,6219 

2007 

120 MHz Thin 

F/C 6214, 6216, 6217, 6219 

2008 

135 MHz Wide 

F/C 6214, 6216, 6217, 6219 

2009 

604e High 

F/C 6214, 6216, 6217, 6219 

2022 

160 MHz Thin 

F/C 6214, 6216, 6217, 6219 


The SP-supported external SSA disk subsystems are: 

7133 IBM Serial Storage Architecture Disk Subsystems Models 010, 020, 
500, and 600. 
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4.3.4.2 SCSI disk requirements 

Some nodes can now be booted from an external SCSI-2 Fast/Wide disk 
7027-HSD storage device. Not all nodes can support an SCSI boot. Table 8 
lists the nodes and the adapters for external disk booting. 

Table 8. Supported adapters for nodes with SCSI boot 


Node Feature 
Code 

Node Type 

Feature Code Numbers of Supported SCSI 
Adapters 

2053 

POWER3 Wide 

F/C 6207, 6209 

2052 

POWER3 Thin 

F/C 6207, 6209 

2051 

332 MHz Wide 

F/C 6207, 6209 

2050 

332 MHz Thin 

F/C 6207, 6209 

Reference information for withdrawn nodes 

2002 

66 MHz Thin 

F/C 2412, 2416, 2420 

2003 

66 MHz Wide 

F/C 2412, 2416, 2420 

2004 

66 MHz Thin 2 

F/C 2412, 2416, 2420 

2005 

77 MHz Wide 

F/C 2412, 2416, 2420 

2006 

604 High 

F/C 2412, 2416, 2420 

2007 

120 MHz Thin 

F/C 2412, 2416, 2420 

2008 

135 MHz Wide 

F/C 2412, 2416, 2420 

2009 

604e High 

F/C 2412, 2416, 2420 

2022 

160 MHz Thin 

F/C 2412, 2416, 2420 

Notes: 

1. F/C 2416 and 2420 are withdrawn from production. 


The SP-supported external SCSI disk subsystems are: 


7027-HSD IBM High Capacity Drawer with an SP SCSI-DE/FW adapter for 
Micro Channel machines or SP Ultra-SCSI adapter for PCI machines. 

4.3.4.3 Specifying an external installation disk 

During the node installation process, external disk information may be 
entered in the SDR by first typing the SMIT fastpath smitty node_data. 
Depending on whether you have already created the Volume_Group, you 
must then choose Create Volume Group Information or Change Volume 
Group Information from the Node Database Information Window (related 
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commands are spmkvgobj or spchvgobj). Alternatively, you may use the SMIT 
fastpath smitty changevg_diaiog (refer to Figure 59 on page 118) to get 
straight there. 

Figure 68 shows the Change Volume Group Information window. In this, the 
user is specifying an external SSA disk as the destination for rootvg on 
nodel. Note that you may specify several disks in the Physical Volume List 
field (refer to 4.3.2.1, “spmkvgobj” on page 116 for more information on how 
to enter the information). 


Change Volume Group 

Information 



Type or select values in entry fields 




Press Enter AFTER making all desired changes. 



[TOP] 


[Entry Fields] 


Start Frame 


M 

# 

Start Slot 


[] 

# 

Node Count 


[] 

# 

OR 




Node List 


[1] 


Volume Group Name 


[rootvg] 


Physical Volume List 


[ssar//0004AC50532100D] 


Number of Copies of Volume Group 


1 

+ 

Set Quorum on the Node 



+ 

Boot/Install Server Node 


M 

# 

Network Install Image Name 


M 


[MORE...2] 




Fl=Help F2=Refresh 

F3=Cancel 

F4=List 


F5=Reset F6=Cammand 

F7=Edit 

F8=Image 


F9=Shell F10=Exit 

Enter=Do 




Figure 68. SMIT panel to specify an external disk for SP node installation 

When you press the Enter key in the Change Volume Group Information 
window, the external disk information is entered in the Node class in the SDR. 
This can be verified by running the spistdata -b command as shown in 
Figure 69 on page 131. This shows that the install disk for node 1 has been 
changed to ssar//0004AC50532100D. 


Under the covers, smitty changevg_diaiog runs the spchvgobj command. This 
is a new command in PSSP 3.1 that recognizes the new external disk 
address formats. It may be run directly from the command line using this 
syntax: 

spchvgobj -r rootvg -h ssar//0004AC50532100D -1 1 
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sp3en0{ / } splstdata -b -1 1 

List Node Boot/Install Information 


node# hostname hdw_enet_addr srvr response install_disk 

last_install_image last_install_time next_install_image lppsource_name 
pssp_ver selected_vg 


1 sp3n01.msc.itso. 02608CE8D2E1 0 install ssar//0004AC510DlE00D 

default initial default aix432 

PSSP-3.1 rootvg 


Figure 69. Output of the splstdata -b command 

4.3.4.4 Changes to the bosinst.data file 

When the changes have been made to the Node class in the SDR to specify 
an external boot disk, the node can be set to install with the spbootins 
command: 

spbootins -s yes -r install -1 1 

The setup_server command will cause the network install manager (NIM) 
wrappers to build a new bosinst.data resource for the node, which will be 
used by AIX to install the node. 

The format of bosinst.data has been changed in AIX 4.3.2 to include a new 
member to the target_disk stanza specified as CONNECTION^ This is 
shown in Figure 70 on page 132 for node 1’s bosinst.data file (node 1 was 
used as an example node in Figure 68 on page 130 and Figure 70 on page 
132). NIM puts in the new CONNECTION= member when it builds the file. 
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control_flow: 

CONSOLE = /dev/ttyO 
INSTALL_METHOD = overwrite 
PROMPT = no 

EXISTING_SYSTEM_OVERWRITE = yes 
INSTALL_X_IF_ADAPTER = no 
RUN_STARTUP = no 
RM_INST_ROOTS = no 
ERROR_EXIT = 

CUSTOMIZATION_FILE = 

TCB = no 

INSTALL_TYPE = full 
BUNDLES = 

target_disk_data: 

LOCATION = 

SIZE_MB = 

CONNECTION = SSar//0004AC50532100D 
locale: 

BOS INST_LANG = en_US 
CULTURAL_CONVENTION = en_US 
MESSAGES = en_US 
KEYBOARD = en US 


Figure 70. bosinst.data file with the new CONNECTION attribute 


4.4 Global file systems 

This section gives an overview of the most common global file systems. A 
global file system is a file system that resides locally on one machine (the file 
server) and is made globally accessible to many clients over the network. All 
file systems described in this section use UDP/IP as the network protocol for 
client/server communication (NFS Version 3 may also use TCP). 

One important motivation to use global file systems is to give users the 
impression of a single system image by providing their home directories on all 
the machines they can access. Another is to share common application 
software that then needs to be installed and maintained in only one place. 
Global file systems can also be used to provide a large scratch file system to 
many machines, which normally utilizes available disk capacity better than 
distributing the same disks to the client machines and using them for local 
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scratch space. However, the latter normally provides better performance; so, 
a trade-off has to be made between speed and resource utilization. 

Apart from the network bandwidth, an inherent performance limitation of 
global file systems is the fact that one file system resides completely on one 
machine. Different file systems may be served by different servers, but 
access to a single file, for example, will always be limited by the I/O 
capabilities of a single machine and its disk subsystems. This might be an 
issue for parallel applications where many processes/clients access the same 
data. To overcome this limitation, a parallel file system has to be used. IBM’s 
parallel file system for the SP is described in 12.4, “General Parallel File 
Systems” on page 341. 

4.4.1 Network File System (NFS) 

Sun Microsystem’s Network File System (NFS) is a widely used global file 
system, which is available as part of the base AIX operating system. It is 
described in detail in Chapter 10, "Network File System" of AIX Version 4.3 
System Management Guide: Communications and Networks, SC23-4127. 

In NFS, file systems residing on the NFS server are made available through 
an export operation either automatically when the NFS start-up scripts 
process the entries in the /etc/exports file or explicitly by invoking the exportfs 
command. They can be mounted by the NFS clients in three different ways. A 
predefined mount is specified by stanzas in the /etc/filesystems file, an 
explicit mount can be performed by manually invoking the mount command, 
and automatic mounts are controlled by the automount command, which 
mounts and unmounts file systems based on their access frequency. This 
relationship is sketched in Figure 71 on page 134. 
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/etc/filesystems 

/home/joe: 
dev = /export/joe 
nodename = nfs_srv 
mount = true 
vfs = nfs 


mount \ 

nfssrv:/export/tmp \ 
/home/tmp 


/etc/auto.master 

/home /etc/auto/maps/home.maps 

/etc/auto/maps/home.maps 
tina nfs_srv:/export/tina 


client 



nfs srv 


Figure 71. Conceptual overview of NFS mounting process 

The PSSP software uses NFS for network installation of the SP nodes. The 
control workstation and boot/install servers act as NFS servers to make 
resources for network installation available to the nodes, which perform 
explicit mounts during installation. The SP accounting system also uses 
explicit NFS mounts to consolidate accounting information. 

NFS is often used operationally to provide global file system services to users 
and applications. Among the reasons for using NFS is the fact that it is part of 
base AIX, it is well-known in the UNIX community, very flexible, and relatively 
easy to configure and administer in small to medium-sized environments. 
However, NFS also has a number of problems. They are summarized below 
to provide a basis to compare NFS to other global file systems. 

Performance: NFS Version 3 contains several improvements over NFS 

Version 2. The most important change is that NFS Version 
3 no longer limits the buffer size to 8 KB, thus improving 
its performance over high bandwidth networks. Other 
optimizations include the handling of file attributes and 
directory lookups and increased write throughput by 
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Security: 


Management: 


Namespace: 


Consistency: 


collecting multiple write requests and writing the collective 
data to the server in larger requests. 

Access control to NFS files and directories is by UNIX 
mode bits, that means by UID. Any root user on a machine 
that can mount an NFS file system can create accounts 
with arbitrary UIDs and, therefore, can access all 
NFS-mounted files. File systems may be exported 
read-only if none of the authorized users need to change 
their contents (such as directories containing application 
binaries), but home directories will always be exported 
with write permissions, as users must be able to change 
their files. An option for secure NFS exists, but is not 
widely used. Proprietary access control lists (ACLs) 
should not be used since not all NFS clients understand 
them. 

A file system served by an NFS server cannot be moved 
to another server without disrupting service. Even then, 
clients mount it from a specific IP name/address and will 
not find the new NFS server. On all clients, references to 
that NFS server have to be updated. To keep some 
flexibility, alias names for the NFS server should be used 
in the client configuration. These aliases can then be 
switched to another NFS server machine should this be 
necessary. 

With NFS, the client decides at which local mount point a 
remote file system is mounted. This means that there are 
no global, universal names for NFS files or directories 
since each client can mount them to different mount 
points. 

Concurrent access to data in NFS is problematic. NFS 
does not provide POSIX single site semantics, and 
modifications made by one NFS client will not be 
propagated quickly to all other clients. NFS does support 
byte range advisory locking, but not many applications 
honor such locks. 


Given these shortcomings, it is not recommended to use NFS in large 
production environments that require fast, secure, and easy to manage global 
file systems. On the other hand, NFS administration is fairly easy, and small 
environments with low security requirements will probably choose NFS as 
their global file system. 


Chapter 4. I/O devices and file systems 1 35 



4.4.2 The DFS and AFS file systems 

There are mainly two global file systems that can be used as an alternative to 
NFS. The Distributed File System (DFS) is part of the Distributed Computing 
Environment (DCE) from the Open Software Foundation (OSF), now known 
as the Open Group. The Andrew File System (AFS) from Transarc is the base 
technology from which DFS was developed; so, DFS and AFS are in many 
aspects very similar. Both DFS and AFS are not part of base AIX, they are 
available as separate products. Availability of DFS and AFS for platforms 
other than AIX differs but not significantly. 

For reasons that will be discussed later, we recommend to use DFS rather 
than AFS except when an SP is to be integrated into an existing AFS cell. We, 
therefore, limit the following high-level description to DFS. Most of these 
general features also apply for AFS, which has a very similar functionality. 
After a general description of DFS, we point out some of the differences 
between DFS and AFS that justify our preference of DFS. 

4.4.2.1 What is the Distributed File System? 

DFS is a distributed application that manages file system data. It is an 
application of the Distributed Computing Environment (DCE) in the sense that 
it uses almost all of the DCE services to provide a secure, highly available, 
scalable, and manageable distributed file system. 

DFS data is organized in three levels: 

• Files and directories. These are the same data structures known from 
local file systems, such as the AIX Journaled File System (JFS). DFS 
provides a global namespace to access DFS files as described below. 

• Filesets. A DFS fileset is a group of files and directories that are 
administered as a unit. Examples would be all the directories that belong 
to a particular project. User home directories may be stored in separate 
filesets for each user or may be combined into one fileset for a whole (AIX) 
group. Note that a fileset cannot be larger than an aggregate. 

• Aggregates. An aggregate is the unit of disk storage. It is also the level at 
which DFS data is exported. There can be one or more filesets in an DFS 
aggregate. Aggregates cannot be larger than the logical volume in which 
they are contained. 

The client component of DFS is the cache manager. It uses a local disk cache 
or memory cache to provide fast access to frequently used file and directory 
data. To locate the server that holds a particular fileset, DFS uses the fileset 
location database (FLDB) server. The FLDB server transparently accesses 
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information about a fileset’s location in the FLDB, which is updated if a fileset 
is created or moved to another location. 

The primary server component is the file exporter. The file exporter receives 
data requests as DCE Remote Procedure Calls (RPCs) from the cache 
manager and processes them by accessing the local file systems in which the 
data is stored. DFS includes its own Local File System (LFS) but can also 
export other UNIX file systems (although with reduced functionality). It 
includes a token manager to synchronize concurrent access. If a DFS client 
wants to perform an operation on a DFS file or directory, it has to acquire a 
token from the server. The server revokes existing tokens from other clients to 
avoid conflicting operations. By this, DFS is able to provide POSIX single site 
semantics. 
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Figure 72. Basic DFS components 

Figure 72 shows these DFS components. Note that this is an incomplete 
picture. There are many more DFS components like the replication server and 
various management services like the fileset server and the update server. 
More detailed information about DFS can be found in the product 
documentation IBM DCE for AIX: Introduction to DCE and IBM DCE for AIX: 
DFS Administration Guide and Reference. 
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The following list summarizes some key features of DCE/DFS and can be 
used to compare DFS with the discussion in 4.4.1, “Network File System 
(NFS)” on page 133. 

Performance: DFS achieves high performance through client caching. 

The client to server ratio is better than with NFS, although 
exact numbers depend on the actual applications. Like 
NFS, DFS is limited by the performance of a single server. 

Security: DFS is integrated with the DCE Security Service, which is 

based on Kerberos Version 5. All internal communication 
uses the authenticated DCE RPC, and all users and 
services that want to use DFS services have to be 
authenticated by logging in to the DCE cell (except when 
access rights are explicitly granted for unauthenticated 
users). Access control is by DCE principal. Root users on 
DFS client machines cannot impersonate these DCE 
principals. In addition, DCE Access Control Lists can be 
used to provide fine-grained control; they are recognized 
even in a heterogeneous environment. 

Management: Since fileset location is completely transparent to the 

client, DFS filesets can be easily moved between DFS 
servers. Using DCE’s DFS as the physical file system, this 
can even be done without disrupting operation. This is an 
invaluable management feature for rapidly growing or 
otherwise changing environments. The fact that there is 
no local information on fileset locations on the client 
means that administering a large number of machines is 
much easier than maintaining configuration information on 
all of these clients. 


Namespace: DFS provides a global, worldwide namespace. The file 

system in a given DCE cell can be accessed by the 
absolute path /.../cell_name/fs/, which can be abbreviated 
as /: (slash colon) within that cell. Access to foreign cells 
always requires the full cell name of that cell. The global 
name space ensures that a file will be accessible by the 
same name on every DFS client. The DFS client has no 
control over mount points; filesets are mounted into the 
DFS namespace by the servers. Of course, a client may 
use symbolic links to provide alternative paths to a DFS 
file, but the DFS path to the data will always be available. 

Consistency: Through the use of a token manager, DFS is able to 

implement complete POSIX single-site read/write 
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semantics. If a DFS file is changed, all clients will see the 
modified data on their next access to that file. Like NFS, 
DFS does support byte range advisory locking. 

Operation: To improve availability, DFS filesets can be replicated; that 

is, read-only copies can be made available by several 
DFS servers. The DFS server processes are monitored 
and maintained by the DCE basic overseer server (BOS), 
which automatically restarts them as needed. 

In summary, many of the problems related to NFS do not exist in DFS or have 
a much weaker impact. DFS is, therefore, more suitable for use in a large 
production environment. On the other hand, DCE administration is not easy 
and requires a lot of training. The necessary DCE and DFS licenses also 
cause extra cost. 

4.4.2.2 Differences of DFS and AFS 

Apart from the availability (and licensing costs) of the products on specific 
platforms, there are two main differences between DFS and AFS: The 
integration with other services and the mechanism to synchronize concurrent 
file access. The following list summarizes these differences: 

Authentication AFS uses Kerberos Version 4 in an implementation that 
predates the final MIT Kerberos 4 specifications. 
DCE/DFS uses Kerberos Version 5. For both, the 
availability of other operating system services (such as 
Telnet or X display managers) that are integrated with the 
respective Kerberos authentication system depends on 
the particular platform. 

Authorization DFS and AFS ACLs differ and are more limited in AFS. 

For example, AFS can only set ACLs on the directory level 
not on file level. AFS also cannot grant rights to a user 
from a foreign AFS cell; whereas, DFS supports ACLs for 
foreign users. 

Directory Service DCE has the Cell Directory Service (CDS) through which 
a client can find the server(s) for a particular service. The 
DFS client uses the CDS to find the Fileset Location 
Database. There is no fileset location information on the 
client. AFS has no directory service. It relies on a local 
configuration file (/usr/vice/etc/CellServDB) to find the 
Volume Location Database (VLDB), the Kerberos servers, 
and other services. 
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RPC 


Both DFS and AFS use Remote Procedure Calls (RPCs) 
to communicate over the network. AFS uses Rx from 
Carnegie Mellon University. DFS uses the DCE RPC, 
which is completely integrated into DCE including security. 
AFS cannot use dynamic port allocation. DFS does so by 
using the RPC endpoint map. 

Time Service DFS uses the DCE Distributed Time Service. AFS clients 
use their cache manager and NTP to synchronize with the 
AFS servers. 

Synchronization Both DFS and AFS use a token manager to coordinate 
concurrent access to the file system. However, AFS 
revokes tokens from other clients when closing a file; 
whereas, DFS already revokes the token when opening 
the file. This means that DFS semantics are completely 
conforming with local file system semantics, whereas, 
AFS semantics are not. Nevertheless, AFS 
synchronization is better than in NFS, which does not use 
tokens at all. 

It is obvious that DFS is well integrated with the other DCE core services; 
whereas, AFS requires more configuration and administration work. DFS also 
provides file system semantics that are superior to AFS. So, unless an 
existing AFS cell is expanded, we recommend that DFS is used rather than 
AFS to provide global file services. 


4.5 Related documentation 

This documentation will help you better understand the different concepts and 
examples covered in this chapter. We recommend you to take a look at some 
of these books in order to maximize your chances of success in the SP 
certification exam 

SP Manuals 

RS/6000: Planning Volume 2, GA22-7281. This manual gives you detailed 
explanations on I/O devices. 

RS/6000 SP: Planning Vol 1, Hardware and Physical Environment , 
GA22-7280. This book is the official document for supported I/O adapters. 

SP Redbooks 

Inside The RS/6000 SP, SG24-5145. NFS and AFS concepts are discussed in 
this redbook. 
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4.6 Sample questions 

This section provides a series of questions to help aid you in preparation for 
the certification exam. The answers to these questions can be found in 
Appendix A. 

1. If you attach a tape drive to one of your nodes, which of the following 
statements are true? 

A. All nodes get automatic access to that tape drive. 

B. Tape access is controlled by the file collection admin file. 

C. Any node can be backed up to the tape unit through a named pipe 
using the switch to provide a high speed transport. 

D. The tape needs to be attached to the control workstation. 

2. Not all node types can support SSA boot. Which of the following 
statements are true? 

A. Only external SP-attached servers support external SSA boot. 

B. Only PCI nodes support external SSA boot. 

C. Only MCA nodes support external SSA boot. 

D. All nodes support external SSA boot except SP-attached servers. 

3. PSSP 3.1 or later supports multiple rootvg definitions per node. To activate 
an specific rootvg volume group, you have to: 

A. Issue the spbootiist command against the node. 

B. Issue the spchvgobj command against the node. 

C. Issue the spbootins command against the node. 

D. Issue the spchvg command against the node. 

4. PSSP uses NFS for network installation and home directory services of 
the SP nodes. The control workstation and boot/install servers act as NFS 
servers to make resources for network installation available to the nodes. 
Which of the following statements are false? 

A. Home directories are served by the control workstation by default. 

B. Home directories are served by boot/install servers by default. 

C. The control workstation is always a NFS server. 

D. Boot/install servers keep local copies of PSSP software. 

5. Which command enables mirroring on a set of nodes? 

A. spmirrorvg -1 node_list 
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B. spbootins -1 <node_list> 

C. sprmvgobj -1 node_list 

D. spmkvgobj -1 nodelist 

6. Which command displays information about Volume_Groups? 

A. splstdata -v -1 <node #> 

B. spbootins -v <lppsouce_name> 

C. sprmvgobj -r vg_name 

D. spmkvgobj -h pv_list 

7. When is NFS NOT recommended to be used as the global file system? 

A. Environments with low security requirements. 

B. In a large production environment. 

C. Environments where the administration is fairly easy. 

D. In small environments. 

8. Which of the following statements regarding DFS data organization are 
NOT true? 

A. DFS data is organized in filesets. 

B. DFS data is organized in files and directories. 

C. DFS data is organized in distribution files. 

D. DFS is organized in aggregates. 

9. When the SDR is initialized, a volume group is created for every node. By 
default, the vg_name attribute of the Volume_Group object is set to rootvg, 
and the selected_vg of the node is set to rootvg. Which of the following 
statements are default values? 

A. Quorum is false. 

B. The default install_disk is hdiskl. 

C. Mirroring is off, copies are set to 1. 

D. There are bootable, alternate root volume groups. 

10. Which of the following commands can be used to be able to boot using 
SSA external disks? 

A. spbootins 

B. spmkvgobj 

C. spmirrorvg 
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D. splstdata 


4.7 Exercises 

Here are some exercises you may wish to perform: 

1. On a test system that does not affect any users, upgrade to AIX 4.3.2 and 
PSSP 3.1. 

2. On a test system that does not affect any users, list all the Volume_Group 
default values. 

3. On a test system that does not affect any users, create a new volume 
group (rootvgl), then activate the new volume group. Hint: Check your level of 
AIX and PSSP before the exercise. 

4. On a test system that does not affect any users, familiarize yourself with 
the various flags of the spmkvgobj command. 

5. On a test system that does not affect any users, familiarize yourself with 
the various flags of the spbootins command. 

6. On a test system that does not affect any users, familiarize yourself with 
the various flags of the splstdata command. 
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Chapter 5. SP-Attached server support 


PSSP 3.1 provides support for the RS/6000 Enterprise Server Models S70, 

S7A, and S80 known as SP-Attached servers. These are high-end RS/6000 

PCI-based and are the first 64-bit SMP architecture nodes that attach 

independently to the SP, as they are simply too large to physically reside in an 

SP frame. 

The main section in this chapter is subdivided into the following five sections: 

1. The system attachment of the SP-Attached server to the SP is discussed 
in “Hardware attachment” on page 145. 

2. Installation and configuration of an SP-Attached server are discussed in 
“Installation and configuration” on page 156. 

3. The PSSP support to SP-Attached server is discussed in “PSSP support” 
on page 162. 

4. User interface panels and commands are discussed in “User interfaces” 
on page 171. 

5. Different attachment scenarios to the SP are discussed in “Attachment 
scenarios” on page 176. 


5.1 Key concepts you should study 

Before taking the SP Certification exam, make sure you understand the 
following concepts: 

• How the SP-Attached servers are connected to the SP (control 
workstation and switch). 

• What are the software levels required to attach an RS/6000 Enterprise 
Server (S70/S7A/S80)? 

• The difference between an SP-Attached server and a dependent node. 

• What are the node, frame, and switch numbering rules when attaching an 
RS/6000 Enterprise server? 


5.2 Hardware attachment 

In this section, we describe the hardware architecture of the SP-Attached 
server and its attachment to the SP system, including areas of potential 
concern of the hardware or the attachment components. 


© Copyright IBM Corp. 2000 
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5.2.1 Brief RS/6000 Enterprise Server overview 

The RS/6000 Enterprise Server Model S70 (7017) is a 64-bit symmetric 
multiprocessing (SMP) system that supports 32- and 64-bit applications 
concurrently. 

Until now, all nodes in an SP environment resided within the slot location of 
an SP frame. However, the SP-Attached server is physically too large to 
reside in an SP frame slot location as it is packaged in two side-by-side rack 
units as shown in Figure 73 on page 147. 

The first unit is a 22w x 41 d x 62h-inch (56w x 104d x 157h-cm) Central 
Electronics Complex (CEC). The CEC system rack contains: 

• A minimum of one processor card and a maximum of three processor 
cards with a 4-, 8-, or 12-way PowerPC processor configuration. The 
system can contain up to a maximum of 12 processors sharing common 
system memory. 

• Each processor card has four 64-bit processors operating at 125 Mhz or 
262 Mhz. 

• A 4 MB ECC L2 cache memory per 125 Mhz processor and an 8 MB per 
262 Mhz processor. 

• System memory is controlled through a multiport controller that supports 
up to 20 memory slots. All the system memory is contained in the system 
rack up to a maximum of 16 GB. 

• An operator panel consisting of the display unit, scroll up and down 
push-button, an Enter button, and two indicator LEDs. The power on/off 
button is also located on the operator panel. In addition, it contains a port 
that can be used through an RS-232 cable to communicate to the S70. 
The operator panel is used for selecting boot options and initiating system 
dumps as well as for service functions and diagnostic support of the entire 
system. 

• Reliability from redundant fans, hot-swappable disk drives, power supplies 
and fans, and a built-in service processor. 

The second unit is a standard I/O rack similar in size to the CEC. Each I/O 
rack accommodates up to two I/O drawers with a maximum of four drawers 
per system. Up to three more I/O racks can be added to a system. The base 
I/O drawer contains: 

• Up to 14 PCI slots per drawer. 

• Drawer zero reserves slots two and eight for support of system media. 

• Service processor and hot-pluggable DASD. 


146 


IBM Certification Study Guide RS/6000 SP 



• Drawers one through three are reserved for supported PCI adapters. 

• One fully configured system of four I/O drawers and up to 56 PCI slots. 

• Support for SCSI/SSA six-packs, looped SSA, and SIO. 



^ CEC 

I/O 

129.84 kg 

I/O I/O 1 

399.52 kg 

empty 


Processors 

48 to 61.3 

One I/O Drawer Required per Rack 

Memory 

storage 

56 Slots 

Power 

drawers 

Std. 49.28 cm 

Racks 

29Terabytes of Storage 


Figure 73. The S70 components 

Since the CEC and I/O racks are so large, the SP-Attached server must be 
attached to the SP system externally. 

5.2.2 SP-Attached server attachment 

This section describes the attachment of the SP-Attached server to the SP 
highlighting the potential areas of concern that must be met before 
installation. The physical attachment is subdivided and described in three 
connections. 

• Connections between the CWS and the SP-Attached server are described 
in “Control workstation connections” on page 151. 

• Connections between the SP Frame and the SP-Attached Server are 
described in “SP frame connections” on page 152. 

• An optional connection between the SP Switch and the SP-Attached 
server are described in “Switch connection (required in a switched SP 
system)” on page 153. 
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These connections are illustrated in Figure 74. 



Figure 74. The S70 attachment to the SP 

The Figure 75 on page 149 outlines the two FtS-232 connections to the S70 
machine. 
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Figure 75. RS-232 connections to the S70 

It is important to note that the size of the S70 prohibits it from being physically 
mounted in the SP frame. Since the SP-Attached server is mounted in its own 
rack and is directly attached to the CWS using RS-232, the SP system must 
view the SP-Attached server as a frame. The SP-Attached server is also 
viewed as a node. Because the PSSP code runs on the machine, it is 
managed by the CWS, and you can run standard applications on the 
SP-Attached server. Therefore, the SP system views the SP-Attached server 
as an object with both frame and node characteristics. 

However, as the SP-Attached server does not have full SP frame 
characteristics, it cannot be considered as a standard SP expansion frame. 
Therefore, when assigning the server’s frame number, you have to abide by 
the following rules: 

• The SP-Attached server cannot be the first frame in the SP system. 
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• The SP-Attached server cannot be inserted between a switch configured 
frame and any non-switched expansion frame using that switch, it can, 
however, be inserted between two switch-configured frames. Different 
attachment configurations are described in 5.6, “Attachment scenarios” on 
page 176. 

Once the frame number has been assigned, the server’s node numbers, 
which are based on the frame number, are automatically generated. The 
following system defaults are used: 

• The SP-Attached server is viewed as a single frame containing a single 
node. 

• The SP-Attached server occupies the slot one position. 

• Each SP-Attached server installed in the SP system subtracts one node 
from the total node count allowed in the system. However, as the server 
has frame-like features, it reserves sixteen node numbers that are used in 
determining the node number of nodes placed after the attached server. 
The algorithm for calculating the node_number is demonstrated in Figure 
76; for further information on the frame numbering issue, refer to Figure 92 
on page 177: 

node_nun±>er = (frame_number -1) * 16 + slot_number 
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Figure 76. Node numbering 


5.2.2.1 Control workstation connections 

The SP-Attached server does not have a frame or node supervisor card, 
which limits the full hardware, control, and monitoring capabilities of the 
server from the SP CWS (unlike other SP nodes). However, it does have 
some basic capabilities, such as power on/off. 

Three CWS connections to the SP-Attached server are required for hardware 
control and software management: 

• An Ethernet connection to the SP-LAN for system administration 
purposes. 

• A custom-built RS-232 cable connected from the S70 operator panel to a 
serial port on the CWS. It is used to emulate operator input at the operator 
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panel. An S70-specific protocol is used to monitor and control the S70 
hardware. This protocol is know as the Service and Manufacturing 
Interface (SAMI). 

• A second custom-built RS-232 cable that must only use the S70 SI serial 
port. This is used to support the si term connectivity. This is a custom-built 
RS-232 cable, which is part of the order features, with a null modem and a 
gender-bender. 

CWS considerations 

In connecting the SP-Attached server to the CWS, it is important to keep the 

following CWS areas of concern in mind: 

• When connecting the SP-Attached frame to the system, you need to make 
sure that the CWS has enough spare serial ports to support the additional 
connections. However, it is important to note that there is one restriction 
with the 16-port RS-232 connection. By design, it does not pass the 
required ClearToSend signal to the SAMI port of the SP-Attached server, 
and, therefore, the 16-port RS-232 cannot be used for the RS-232 
connectivity to the SP-Attached server. The eight-port and the 128-port 
varieties will support the required signal for connectivity to the 
SP-Attached server. 

• There are two RS-232 attachments for each S70/S7A/S80 SP attachment. 
The first serial port on the S70/S7A/S80 must be used for S1TERM 
connectivity. 

• Floor placement planning to account for the effective usable length of the 
RS-232 cable. 

The CWS-to-S70 connection cables are 15 meters in length, but only 11.5 
meters is effective. So, the S70 must be placed at a distance where the 
RS-232 cable to the CWS is usable. 

• In a HACWS environment, there will be no S70 control from the backup 
CWS. In the case where a failover occurs to the backup CWS, hardmon 
and slterm support of the S70 is not available until fail back to the primary 
CWS. The node will still be operational with switch communications and 
SP Ethernet support. 

5.2.2.2 SP frame connections 

The SP-Attached server connection to the SP frame is as follows: 

• 10 meter frame-to-frame electrical ground cable. 

The entire SP system must be at the same electrical potential. Therefore, 
the frame-to-frame ground cables provided with the S70 server must be 
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used between the SP system and the S70 server in addition to the S70 
server electrical ground. 

Frame considerations 

In connecting the SP-Attached server to the SP Frame, it is important to have 
the following in mind: 

• The SP system must be a tall frame as the 49 inch short LowBoy frames 
are not supported for the SP-attachment. 

• The tall frame with the eight port switch is not allowed. 

• The SP-Attached server cannot be the first frame in the SP system. So, 
the first frame in the SP system must be an SP frame containing at least 
one node.This is necessary for the SDR_config code, which needs to 
determine whether the frame is with or without a switch. 

• Maximum of eight SP-Attached servers are supported in one SP system. 
This means that if a switch is installed, there must be eight available 
switch connections in the SP system, one per SP-Attached server. 

For complete power planning information, refer to Site and Hardware 
Planning Information, SA38-0508. 

5.2.2.3 Switch connection (required in a switched SP system) 

This is the required connection if the SP-Attached server is to be connected 
to a switched SP system. 

• The TB3PCI adapter, known as the RS/6000 SP system attachment 
adapter, of the SP-Attached server connects to the 16-port SP switch 
through a 10 meter switch cable. 

This TB3PCI adapter is used in those systems that are connected to the 
switch board using a PCI adapter, and it has the following characteristics: 

• It is driven by a 99 Mhz 603e PowerPC processor. 

• It has a sustained bandwidth of 85 MByte/sec. 

• It has components familiar to the SP environment. 

• Its device driver is derived from TB3MX. 

• It is supported only in the S70 server family. 

Switch considerations 

In connecting the SP-Attached server to the SP Switch, it is important to note 
the following: 

• The High Performance switch (HiPS) cannot be used with an SP-Attached 
server since this switch is not supported in PSSP 3.1. 
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• The S70/S7A/S80 servers will be the first, and currently the only, nodes 
attached to the switch using an RS/6000 SP Attachment adapter. 

• Only one RS/6000 SP Attachment adapter is allowed per SP-Attached 
server. 

• The RS/6000 SP Attachment adapter that is placed in the SP-Attached 
server requires: 

• One valid, unused switch port on the SP Switch corresponding to a 
legitimate node slot in your SP configuration. 

• The SP attachment adapter reserves three media slots in the I/O tower 
of the S70 server and has the following placement restrictions: 

• Must be installed in slot 10 of the SP-Attached server’s I/O tower. 

• Slot 9 must be left open to ensure that the adapter has sufficient 
bandwidth. 

• Slot 11 must be left open to provide clearance for the switch 
adapter’s heat sinks. 

These restrictions are illustrated in Figure 77. 



Figure 77. S70 Switch adapter attachment slot 

• Floor placement planning to account for the effective, usable switch 
cable. 
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The SP Switch-to S70 connection cable is 10 meters in length, but only 
6.5 meters is effective. So, the S70 switch adapter located in slot 10 
must be within 6.5 meters of the SP Switch as illustrated in Figure 78. 



Figure 78. S70 floor placement 

SP-Attached server considerations 

In connecting the SP-Attached server to the SP system, it is important to 
have in mind the following potential concerns: 

• Supported adapters 

All adapters currently supported in the SP environment are supported with 
the SP-Attached servers (S70). However, not all currently supported 
SP-Attached server adapters are supported in the SP Switch-attached 
server environment. If the S70 possesses adapters that are not currently 
supported in the SP environment, they must be removed from the 
SP-Attached server. 

The following is a list of supported adapters: 

• F/C 2741 FDDI SK-NET LP SAS 

• F/C 2742 FDDI SK-NET LP DAS 

• F/C 2743 FDDI SK-NET UP SAS 

• F/C 2751 S/390 ESCON Channel Adapter 

• F/C 2920 Token Ring Auto Lanstream 

• F/C 2943 EIA 232/RS-422 8-port Asynchronous Adapter 

• F/C 2944 WAN RS-232 128-port 

• F/C 2962 2-port Multiprotocol X.25 Adapter 

• F/C 2963 ATM 155 TURBOWAYS UTP 

• F/C 2968 Ethernet 10/100 MB 
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• F/C 2985 Ethernet 10 MB BNC 

• F/C 2987 Ethernet 10 MB AUI 

• F/C 2988 ATM 155 MMF 

• F/C 6206 Ultra SCSI SE 

• F/C 6207 Ultra SCSI DE 

• F/C 6208 SCSI-2 F/W SE 

• F/C 6209 SCSI-2 F/W DE 

• F/C 6215 SSA RAID 5 

• SP-Attached server Ethernet required as enO: 

For the S70 server, only the 10 Mbps BNC or the 10 Mbps AUI Ethernet 
adapters are supported for SP-LAN communication in accordance with the 
existing SP-LAN configuration. Note that the BNC adapters provides the 
BNC cables, but the AUI ethernet adapter does not provide the twisted 
pair cables. 

The SP-LAN adapter must be configured as the enO adapter of the 
SP-Attached server (that is, the lowest numbered Ethernet bus slot in the 
first I/O tower). 

• Minimum code requirements: 

The CWS and SP-Attached server must be running AIX 4.3.2 and PSSP 
3.1 at the minimum. Hence, an existing S70 may require an AIX upgrade 
before installation of PSSP 3.1 to achieve SP-attachment. 


- Note - 

Each SP-Attached server S70 must have a PSSP 3.1 licence separately 
chargeable against each S70 serial number. 


5.3 Installation and configuration 

The SP-Attached server is treated as similarly as possible to a frame with a 
node. However, there are some important distinctions that have to be 
addressed during SP-Attached server configuration, namely the lack of frame 
and node supervisor cards and support for two ttys instead of one as 
described in 5.2.2, “SP-Attached server attachment” on page 147. 

Information that is unique to the SP-Attached server is entered in the 
configuration of this server. Once the administrator configures the necessary 
information about the SP-Attached server processor in the SDR, then the 
installation should proceed the same as any standard SP node in the SP 
administrative network. 
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Configuration considerations 

• Add two ttys on the CWS. 

• Define the Ethernet adapter on the SP-Attached server. 

• In a switched system, configure the SP-Attached server to the SP Switch. 

• Frame definition of SP-Attached server: 

The rules for assigning the frame number of the SP-Attached server are 
detailed in section 5.2.2, “SP-Attached server attachment” on page 147. 

The SP-Attached server must be defined to PSSP using the spframe 
command and using the new options that are available for SP-Attached 
server for this command: 

/usr/lpp/ssp/bin/spframe -p {hardware protocol} 

-n { starting_switchj>ort} 

[-r {yes|no}] [-s {sltty}] 
start_frame frame_count starting_tty_j?ort 

Alternatively, you can use the smitty nonsp_frame_diaiog menu as shown 
in Figure 79. 


Non-SP Frame Information 

Type or select values in entry fields. 

Press Enter AFTER making all desired changes. 


* Start Frame 

* Frame Count 

* Starting Frame tty port 

* Starting Switch Port Number 
si tty port 

* Frame Hardware Protocol 

Re-initialize the System Data Repository 


[Entry Fields] 

[ ] # 

[ ] # 

[/dev/ttyO] 

[ ] # 

[ ] 

[SAMI] 

no + 


Fl=Help 

Esc+5=Reset 

Esc+9=Shell 


F2=Refresh 
Esc+6=Command 
Esc+0=Exit 


F3=Cancel 
Esc+7=Edit 
Enter=Do 


F4=List 

Esc+8=Image 


Figure 79. Non-SP frame information 
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This menu will request frame number, tty ports, and switch port numbers. 
This will establish hardmon communications with the SP-Attached server 
and create the frame object in the SDR. 

• Hardware Ethernet address collection: 

The MAC address of the SP-Attached server is retrieved by sphrdwrad in 
just the same way as a normal SP node and placed in the SDR. 

Now that the SP-Attached server is configured as a SP-Attached server 
frame in the SDR, it is ready for standard configuration and installation as 
a normal node. Full instructions are defined in PSSP Installation and 
Migration Guide, GA22-7347. 

• Boot/Install consideration: 

The default setup for boot/install servers is that the CWS is the boot/install 
server for a single frame system. In a multiple frame system, the CWS 
installs the first node in each frame and defines this node as the 
boot/install server for the remaining nodes in its frame. 

If, however, the multiple frame system contains an SP-Attached server, 
the CWS remains as the default boot/install server for the first node in 
each frame. The first node in each SP frame becomes the boot/install 
server with the exception of the SP-Attached server, which is treated as a 
node instead of a frame. 

• Installing the node: 

The configuration and installation of the SP nodes and SP-Attached 
servers are identical. All of the installation operations will be performed 
over the Ethernet with one of the tty lines providing the slterm capabilities 
and the other tty line providing the hardware control and monitoring 
functions. 

• System partitioning consideration: 

If the system has multiple partitions defined, and you wish to add an 
SP-Attached server, you do not need to bring the system down to one 
partition, as the SP-Attached server appears as a standard SP node to the 
system partition. 

Each SP-Attached server has appropriate frame, slot values, and switch 
port numbers. These values are accommodated for existing attributes in 
the relevant frame, node, and Syspar_map SDR classes. 

When the SP-Attached server frame/node is defined to the system with 
the spframe command, the switch port number to which the node is 
connected is identified. This number is also necessary in a switchless 
system to support system partitioning. 
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If it is necessary to change the switch port number of the SP-Attached 
server, then the node has to be deleted and redefined with a new switch 
port number. Deleting this node should be done by deleting the frame to 
ensure that no inconsistent data is left in the SDR. 

• if more than one partition exists, repartition to a single partition. 

• Invoke spdeiframto delete the SP-Attached server frame and node 
definitions. 

• Recable the server to a new switch port. 

• Invoke spframe to redefine the SP-Attached server frame and node 
to specify the new switch port number. 

• If the system was previously partitioned, repartition back to the 
system partitioning configuration. 

Considerations when integrating an existing SP-Attached server: 

Perform the following steps to add an existing SP-Attached Server and 
preserve its current software environment: 

1. Physical attachment 

When integrating an existing SP-Attached server node to your 
system, it is recommended (though not mandatory) that the frame 
be added to the end of your system to prevent having to 
reconfiguring the SDR. Different attachment scenarios are 
described in “Attachment scenarios” on page 176. 

2. Software levels 

If your SP-Attached server is not at AIX 4.3.2, upgrade to that level. 
Ensure that the PSSP code_version is set to PSSP-3.1. 

3. Customize node 

To perform a preservation install of an SP-Attached server with 
PSSP software, the node must be set to customize instead of install 
in the SDR. For example: 

spbootins -r customize -1 33 

4. Mirroring 

If the root volume group of the SP-Attached server has been 
mirrored, and the mirroring is to be preserved, the information about 
the existing mirrors must be recorded in the SDR; otherwise, the 
root volume group will be unmirrored during customization. 

For example, if the root volume group of the S70 Advanced Server 
has two copies on two physical disks in locations 30-68-00-0,0 and 
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30-68-00-2,0 with quorum turned off, enter the following to preserve 
the mirroring: 

spchvgobj -r rootvg -c 2 -q false -h 30-68-00-0,0:30-68-00-2,0 -1 
33 

To verify the information, enter: 

splstdata -b -1 33 

5. Set up the Name Resolution of the SP-Attached server 

For PSSP customization, the following must be resolvable on the 
SP-Attached server: 

• The control workstation host name. 

• The name of the boot/install server’s interface that is attached to the 
SP-Attached server’s enO interface. 

6. Set up routing to the control workstation host name 

If a default route exists on the SP-Attached server, it must be 
deleted. If it is not removed, customization will fail when it tries to 
set up the default route defined in the SDR. In order for 
customization to occur, a static route to the control workstation’s 
host name must be defined. For example, the control workstation’s 
host name is its Token Ring address, such as 9.114.73.76, and the 
gateway is 9.114.73.256: 

route add -host 9.114.73.76 9.114.73.256 

7. FTP the SDR_dest_info file 

During customization, certain information will be read from the SDR. 
In order to get to the SDR, the /etc/SDR_dest_info file must be 
FTPed from the control workstation to the /etc/SDR_dest_info file of 
the SP-Attached server ensuring the mode and ownership of the file 
is correct. 

8. Verify perfagent 

Ensure that perfagent.tools 2.2.32.x are installed on the 
SP-Attached server. 

9. Mount the pssplpp directory 

Mount the /spdata/sysl/install/pssplpp directory from the boot/install 
server on the SP-Attached server. For example, issue: 

mount k3n01:/spdata/sysl/install/pssplpp /rant 

10. Install ssp.basic 


160 


IBM Certification Study Guide RS/6000 SP 



Install spp.basic and its prerequisites onto the SP-Attached server. 
For example: 

installp /aXgd/mnt/PSSP-3.1 ssp.basic 2>&1 | tee /tmp/install.log 

11 .Unmount the pssplpp directory 

Unmount the /spdata/sysl/install/pssplpp directory on the 
boot/install server from the SP-Attached server. For example: 

umount /mnt 

12. Run pssp_script 

Run the pssp_script by issuing: 

/usr/lpp/ssp/instal1/bin/pssp_script 

13. Reboot 

Perform a reboot of the SP-Attached server. 

5.3.1 Pre-installation checklist 

Using the SP configurator, the following hardware and software components 
for the SP-Attached server should be ordered: 

1. Feature 9122 Node Attachment 
The feature provides the following: 

• 15 meters custom RS-232 cable between S70 and CWS (S1TERM). 

• 15 meters custom RS-232 cable between S70 and CWS (SAMI). 

• This feature includes the frame-to-frame electrical ground cable. 

2. Feature 9123 Frame Attachment 

This feature keeps track of how many frames are in your SP system to 
avoid exceeding the limit. 

3. Feature 5700/1/2 for SP-Attached Server PSSP 

PSSP 3.1 is a separately charged software license for each SP-Attached 
server. 

AIX 4.3.2 is included with the SP-Attached server and preloaded at the 
factory. Therefore, it does not need to be ordered separately. 

This feature must be ordered for a non-switched system as well. 

4. 9222 Node Attachment Ethernet BNC Boot Feature 
Includes BNC cable for SP Ethernet Communications. 

5. 9223 Node Attachment Ethernet Twister pair Boot Feature 
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This feature does not provide twisted pair cables. 

6. The following features are optional and are only required if the 

SP-Attached server should be attached to the switch. In a switchless 
system, this feature is not necessary. 

• Feature 8396 RS/6000 SP System Attachment Adapter 

• Feature 9310, 10 meter SP switch cable 


5.4 PSSP support 

This section describes the PSSP software support to the SP-Attached server. 
Of special interest is the fact that the SP-Attached server does not use the SP 
node or frame supervisor cards. Flence, the software modifications and 
interface to the SP-Attached server must simulate the architecture of the SP 
Frame Supervisor Subsystem such that the boundaries between an SP node 
and an SP-Attached server node are minimal. 

5.4.1 SDR classes 

The SDR contains system information describing the SP hardware and 
operating characteristics. Several class definitions have changed to 
accommodate the support for SP-Attached servers, such as frame, node, and 
Syspar_map classes. A new class definition has been added in PSSP 3.1, the 
NodeControl class. 

The classes that contain information related to SP-Attached servers are 
briefly described. 

• Frame class 

Currently, the frame class is used to contain information about each SP 
frame in the system. This information includes physical characteristics 
(number of slots, whether it contains a switch, and so forth), tty port, 
hostname, and the internal attributes used by the switch subsystem. 

SP-Attached server nodes do not have physical frame hardware and do 
not contain switch boards. Flowever, they do have hardware control 
characteristics, such as tty connections and associated Monitor and 
Control Nodes (MACN). Therefore, an SDR Frame Object is associated 
with each SP-Attached server node to contain these hardware control 
characteristics. 

Two new attributes have been added to the frame class: 
hardware_protocol and s1_tty. 
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The hardware_protocol attribute distinguishes the hardware 
communication method between the existing SP frames and the new 
frame objects associated with SP-Attached server nodes. For these new 
nodes, the hardware communication method is SAMI (Service and 
Manufacturing Interface), which is the protocol used to communicate 
across the serial connection to the SP-Attached server service processor. 

The attribute s1_tty is used only for the SP-Attached server nodes and 
contains the tty port for the SI serial port connection established by the 
siterm command. 

A typical example of a frame class with the new attributes and associated 
values is illustrated in Figure 80. 


frame_number 

tty 

frame_type 

MAC 

b_MACN 

slots 

f_in_config 

snnjndex 

switch_config 

hardware_protocol 

s1_tty 

1 

/dev/ttyO 

switch 

spew 


16 


0 

0 

sp 


2 

/dev/tty2 


spew 


1 




SAMI 

/dev/ttyl 


Figure 80. Example of a Frame class with an SP-Attached server 

• Node class 

The SDR Node class contains node-specific information used throughout 
PSSP. Similarly, there will be an SDR Node object associated with the 
SP-Attached server. 

SP frame nodes are assigned a node_number based on the algorithm 
described in section 5.2.2, “SP-Attached server attachment” on page 147. 

Likewise, the same algorithm is used to compute the node number of a 
SP-Attached server frame nodes where the SP-Attached server occupies 
the first and only slot of its frame. This means that for every SP-Attached 
server frame node, 16 node numbers will be reserved, of which only the 
first one will ever be used. 

The node number is the key value used to access a node object. 

Some entries of the Node class example are outlined in Figure 81 on page 
164. 
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Node Class 

Nodes in an SP 

Attached S70 Node 

Node Number 

1-16 

17 

Slot Number 

1-16 

1 (always) 

Switchnoden umber 

0-15 

i 

Switchchipport 

0-15 

any port used from 0-15 

Switch_chip 

4-7 

any chip used from 4-7 

Switch_number 

1 

1 

Bootdevice 

enO 

enO 

Description 

112_MHZ_SMP_High 
66_M HZ_PWR2_Th in 
66_M HZ_PWR2_Wide 

7017-S70 

Platform 

rs6k 

chrp 

h a rd wa re_co n t ro l_ty pe 

161 high, 97 thin, 81 wide, 
...,etc. 

10 (S70/S7A) 


Figure 81. Entries of the Node class for SP nodes and SP-Attached server 

The platform attribute has a value of Common Hardware Reference 
Platform (CHRP) for the SP-Attached server. 

The hardware_control_type key value is used to access the NodeControl 
class. A value of 10 suggests an SP-Attached server. 

• Syspar_map class 

The Syspar_map class contains one entry for each switch port, assuming 
each frame would contain a switch. 

As the SP-Attached server has node characteristics, it has an entry in the 
Syspar_map class for that node with no new attributes. 

The used attribute of the Syspar_map will be set to one for the 
SP-Attached server node to indicate that there is a node available to 
partition. Since this node will be attached to the switch, the 
switch_node_number will be set appropriately based on the switch port in 
an existing SP frame that the SP-Attached server node is connected to. 

In a switchless system, the switch_node_number will be assigned by the 
administrator using the spframe command. 

An example of the syspar_map class is shown in Figure 82 on page 165. 
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syspar_name 

syspar^addr 

node„number 

switch_node_number 

used 

node_type 

k48s 

9.114.11.48 

1 

0 

1 

standard 

k48s 

9.114.11.48 

17 

1 

1 

standard 

k48s 

9.114.11.48 

3 

2 

1 

standard 

k48s 

9.114.11.48 

16 

15 

1 

standard 


Figure 82. Example of the Syspar_map class with SP-Attached server 

The SDR_config command has been modified to accommodate these new 
SDR attribute values and now handles the assignment of 
switch_port_numbers for SP-Attached server nodes. 

• NodeControl class 

In order to support different levels of hardware control for different types of 
nodes, a new SDR class has been defined to store this information. 

The NodeControl class is a global SDR class that is not partition- 
sensitive. It contains one entry for each type of node that can be 
supported on an SP system. Each entry contains a list of capabilities that 
are available for that type of node. This is static information loaded during 
installation and is not changed by any PSSP code. This static information 
is required by the SDR_config script to properly configure the node. 

An example of the NodeControl class is illustrated in Figure 83. 


NodeControl Class 


Type 

Capabilities 

Slots_used 

Platform_type 

Processor_type 

65 

Powe r, reset, tty, KeySwitc h, LE D, NetworkBoot 

1 

rs6k 

UP 

161 

Power,reset,tty, KeySwitch,LCD, NetworkBoot 

4 

rs6k 

MP 

33 

Power, reset,tty,KeySwitch, LED, NetworkBoot 

1 

rs6k 

UP 

10 

Powe r, tty, LC D, N etwo rkB oot 

1 

chrp 

MP 

177 

Power, reset,tty,LCD,NetworkBoot 

1 

chrp 

MP 

115 

Powe r, reset, tty, KeySwitc h, LED, NetworkBoot 

2 

rs6k 

UP 


Figure 83. Example of the NodeControl class with the SP-Attached server 

The key link between the Node class and the NodeControl class is the 
node type, which is a new attribute stored in the SDR Node object. The 
SP-Attached server has a node type value of 10 with hardware capabilities 
of power on/off, tty, LCD, and network boot as outlined in Figure 84. 
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Perspectives routines and hardmon commands access this class to 
determine the hardware capabilities for a particular node before 
attempting to execute a command for a given node. 


5.4.2 Hardmon 

Hardmon is a daemon that is started by the System Resource Controller 
(SRC) subsystem that runs on the CWS. It is used to control and monitor the 
SP hardware (frame, switch, and nodes) by opening a tty that communicates 
using an internal protocol to the SP frame supervisor card through a serial 
RS-232 connection between the CWS and SP frame. 

The new SP-Attached server does not have a frame or node supervisor card 
that can communicate with the hardmon daemon. Therefore, a new 
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mechanism to control and monitor SP-Attached servers is provided in 
PSSP3.1. 

Hardmon provides support for SP-Attached servers in the following way: 

• It discovers the existence of SP-Attached servers. 

• It controls and monitors the state of SP-Attached servers, such as power 
on/off. 

Discover the SP-Attached server 

For hardmon to discover the hardware, it must first identify the hardware and 
its capabilities. Today, for each frame configured in the SDR’s frame class, 
hardmon opens a tty defined by the tty field. A two-way communication to the 
frame supervisor through the RS-232 interface occurs where hardmon sends 
hardware control commands and receives state data in the form of packets. 

With PSSP 3.1, two new fields have been added to the SDR’s frame class: 
hardware_protocol and s1_tty. They enable hardmon to determine the new 
hardware that is externally attached to the SP and also what software 
protocol must be used to communicate to this hardware. 

Currently, the only two supported values for the hardware_protocol field are 
SP and SAMI. However, these values are extensible for new hardware 
protocol drivers that will emerge as more externally connected hardware is 
supported. 

Upon initialization, hardmon reads its entries in the SDR Frame class and 
also examines the value of the hardware_protocol field to determine the type 
of hardware and its capabilities. If the value read is SP, this indicates that SP 
nodes are connected to hardmon through the SP’s Supervisor subsystem. A 
value of SAMI is specific to the S70/S7A/S80 hardware since it is the SAMI 
software protocol that allows the communication, both sending messages and 
receiving packet data, to the S70/S7A/S80’s Service Processor. 

Once hardmon recognizes the existence of one or more S70/S7A/S80s in the 
configuration, it starts a new process - the S70 daemon. One S70 daemon is 
started for each frame that has an SDR Frame class hardware_protocol v alue 
of SAMI. Now, hardmon can send commands and process packets or serial 
data as it would with normal SP frames. This is illustrated in Figure 85 on 
page 168. 
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hardware_protocol = SAMI 
SI _tty = /dev/ttyl — 

tty=/dev/tty2 

hardware_protocol = sp 

S1_tty = "" - 

tty=/dev/ttyO 


packets of_ 

state data 

sockets- 

SI term 
s1_tty 
(RS-232) 



Figure 85. Hardmon flow of control 

It is important to note that only hardmon starts the S70 daemon and no other 
invocation external to hardmon is possible. In addition, the parent hardmon 
daemon starts a separate S70 daemon for each S70 frame configured in the 
SDR Frame class. 

The S70 daemon starts with the following flags: 

/usr/lpp/ssp/install/bin/S70d -d 0 2 1 8 /dev/tty2 /dev/ttyl 

where -d indicates the debug flag, o is the debug option, 2 is the frame 
number, i is the slot number (which is always 1), 8 is the file descriptor of the 
S70d’s side of the socket that is used to communicate with hardmon, 
/dev/tty 2 is the tty that is used to open SAMI/MI operator panel port, and 
/dev/ttyi is the serial tty. 
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S70 daemon 

The S70 daemon interfaces to the S70 hardware and emulates the frame and 
node supervisor by accepting commands from hardmon and responding with 
hardware state information in the same way as the frame supervisor would. 
Its basic functions are: 

• It polls the S70 for hardware changes in hardware status and returns the 
status to hardmon in the form of frame packet data. 

• It communicates with the S70 hardware through the SAMI/MI interface. 

• It accepts hardware control commands from hardmon to change the power 
state of the S70 and translates them into SAMI protocol, the language that 
the Manufacturing Interface (Ml) understands. It then sends the command 
to the hardware. 

• It opens the tty defined by the tty field in the SDR Frame class through 
which the S70 daemon communicates to the S70 serial connection. 

• It supports an interface to the S70 SI serial port to allow console 
connections through slterm. 

• It establishes and maintains data handshaking in accordance with the S70 
Manufacturing Interface (Ml) requirements. 

Dataflow 

Hardmon requests are sent to the S70 daemon where the command is 
handled by one of two interface components of the S70 daemon, the frame 
supervisor interface, or the node supervisor interface. 

The frame supervisor interface is responsible for keeping current the state 
data in the frames’ packet and formats the frame packet for return to 
hardmon. It will accept hardware control commands from hardmon that are 
intended for itself and pass-on to the node supervisor interface commands 
intended to control the S70/S7A/S80 node. 

The node supervisor interface polls state data from the S70/S7A/S80 
hardware for keeping current the state data in the nodes’ packet. The node 
supervisor interface will translate the commands received from the frame 
supervisor interface into S70/S7A/S80 software protocol and sends the 
command through to the S70/S7A/S80 service processor. 

If the hardmon command is intended for the frame, the frame supervisor entity 
of the S70d handles it. If intended for the node, the node supervisor entity 
converts it to SAMI protocol and sends it out the SAMI/MI interface file 
descriptor as illustrated by Figure 86 on page 170. 
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Figure 86. S70 daemon internal flow 

The S70 daemon uses SAMI protocol, which takes the form of 4-byte 
command words, to talk to the S70’s Manufacturing Interface. This interface 
communicates with the S70’s operator panel, which in turn communicates 
with the S70’s Service Processor. It is the Service Processor that contains the 
instruction that acts upon the request. Data returned to the S70 daemon 
follows the reverse flow. 

Monitoring of SP-Attached server 

For hardmon to monitor the hardware, it must first identify the hardware and 
its capabilities. 

The hardware control type is determined from the SDR Node class as a 
hardware_control_type attribute. This attribute is the key into the NodeControl 
class. The NodeControl class will indicate the hardware capabilities for 
monitoring. This relationship is illustrated in Figure 84 on page 166. 
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Hardmon Resource Monitor Daemon 

The Hardmon Resource Monitor Daemon (HMRMD) supports the Event 
Management resource variables to monitor nodes. With the new SP-Attached 
servers, new resource variables are required to support their unique 
information. 

There are four new hardmon variables that will be integrated into the 
Hardmon Resource Monitor for the SP-Attached servers. They are 
SRChasMessage, SPCNhasMessage, src, and spcn. Historical states, such 
as nodePower, serialLinkOpen, and type, are also supported by the 
SP-Attached servers. The mechanics involved with the definition of these 
variables are no different than with previous variables and can be viewed 
through Perspectives and in conjunction with the Event Manager. 

In order to recognize these new resource variables, the Event Manager must 
be stopped and restarted on the CWS and all the nodes in the affected 
system partition. 


5.5 User interfaces 

This section highlights the changes in the different user interface panels and 
commands that have been made to represent the SP-Attached server to the 
user. 

5.5.1 Perspectives 

As SP must now support nodes with different levels of hardware capabilities, 
an interface was architected to allow applications, such as Perspectives, to 
determine what capabilities exist for any given node and respond accordingly. 
This interface will be included with a new SDR table, the NodeControl class. 

The Perspectives interface needs to reflect the new node definitions: Those 
that are physically not located on an SP frame and those nodes that do not 
have full hardware control and monitoring capabilities. 

There is a typical object representing the SDR Frame object for the 
SP-Attached server node in the Frame/Switch panel. This object has a unique 
pixmap placement to differentiate it from a high and low frame, and this 
pixmap is positioned according to its frame number in the Perspectives panel. 

An example of the Perspective representation of the SP-Attached server is 
shown in Figure 87 on page 172. 
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Figure 87. Example of perspectives with SP-Attached server 

The monitored resource variables are handled the same as for standard SP 
nodes. Operations, status, frame, and node information are handled the same 
as for standard SP nodes. 

Only the Hardware Perspective (sphardware) GUI is affected by the new 
SP-Attached server nodes. The remaining panels, Partitioning Aid 
Perspective (spsyspar), Performance Monitoring Perspective (spperfmon), 
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Event Perspective (spevent), and VSD Perspective (spvsd) are all similar to 
the sphardware Perspective node panel since they are based on the same 
class. Therefore, the pixmap’s placement will be similar to that of the 
sphardware Perspective node panel. 

Event Manager 

With the new SP-Attached server nodes, new resource variables are required 
to support their unique information. 

These new resource variables will be integrated into the Hardmon Resource 
Monitor for the SP-Attached server: 

• IBM.PSSP.SP_HW.Node.SRChasMessage 

• IBM.PSSP.SPJHW.Node.SPCNhasMessage 

• IBM.PSSP.SP_HW.Node.src 

• IBM.PSSP.SP_HW.Node.spcn 

In order to recognize these new resource variables, the Event Manager must 
be stopped and restarted on the CWS and all the nodes in the affected 
system partition. 

5.5.1.1 System management 

The various system management commands that display new SDR attributes 
for SP-Attached servers are: 

•spmon 

Figure 89 on page 175 outlines the spmon -a -g output in an SP system 
that consists of an SP Frame and an SP-Attached server. 
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1. Checking : 

server process 






Process 11454 has 

accumulated 9 minutes and 

27 seconds. 


Check ok 








2. Opening connection to server 





Connection opened 






Check ok 








3. Querying : 

frame(s) 







2 frame(s 

) 







Check ok 








4. Checking frames 







Controller 

Slot 17 Switch Switch 

Power supplies 


Frame Responds 

Switch 

Power Clocking 

A 

BCD 


1 yes 


no 

N/A 

N/A 

on 

N/A N/A N/A 


2 yes 


no 

N/A 

N/A 

N/A N/A N/A N/A 


5. Checking nodes 










-Frame 

1- 




Frame Node 

Node 


Host/Switch 

Key 

Env 

Front Panel LCD/LED is 

Slot Number 

Type 

Power 

Responds 

Switch 

Fail 

LCD/LED 

Flashing 

1 1 

high 

on 

yes no 

normal 

no 

LCDs are blank 

no 

5 5 

high 

on 

yes no 

normal 

no 

LCDs are blank 

no 

9 9 

high 

on 

yes no 

normal 

no 

LCDs are blank 

no 

13 13 

high 

on 

yes no 

normal 

no 

LCDs are blank 

no 




-Frame 

2- 




Frame Node 

Node 


Host/Switch 

Key 

Env 

Front Panel LCD/LED is 

Slot Number 

Type 

Power 

Responds 

Switch 

Fail 

LCD/LED 

Flashing 

1 17 

extrn 

on 

no no 

normal 

no nc 

) no 








LCD2 is blank 


Figure 88. The output of the spmon command 
• splstdata 

Figure 89 on page 175 is the output of splstdata -n. It shows two 
frames. Figure 90 on page 175 shows the output from splstdata -f 
where the S70 is shown as a second frame. Figure 90 on page 175 
shows the hardware description of each node in the SP system. 
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• The SP frame has frame number 1 with four high nodes of node 
numbers 1,5, 9, and 13, each occupying four slots. 

• The SP-Attached server has frame number 2, with one node, 
node_number 17, occupying one slot. 


List Node Configuration Information 

node# frame# slot# slots initial hostname reliable hostname dcehostname 
default route processor type processors installed description 

1 

1 1 

4 

c60n01.ppd.pok.i 

c60n01.ppd.pok.i "" 


9.114.88.94 


MP 

4 112_MHz_SMP_High 

5 

1 5 

4 

c60n05.ppd.pok.i 

c60n05.ppd.pok.i "" 


9.114.88.94 


MP 

4 7 5_MHz_SMP_High 

9 

1 9 

4 

c60n09.ppd.pok.i 

c60n09.ppd.pok.i "" 


9.114.88.94 


MP 

4 7 5_MHz_SMP_High 

13 

1 13 

4 

c60nl3.ppd.pok.i 

c60nl3.ppd.pok.i "" 


9.114.88.94 


MP 

4 112_MHz_SMP_High 

17 

2 1 

1 

c60tpln02.ppd.po 

c60tpln02.ppd.po "" 


9.114.88.1 


MP 

ii ii 


Figure 89. splstdata -n output 

Figure 90 is the output of splstdata -f, which shows two frames: 


List Frame 

Database Information 


frame# 

tty 

sl_tty 

frame type hardware_protocol 

1 

/dev/ttyO 

II II 

switch SP 

2 

/dev/ttyl 

/dev/tty2 

"" SAMI 


Figure 90. splstdata -f output 


Figure 91 on page 176 is the output of spgetdesc -u -a, which shows 
the hardware description obtained from the Node class. 
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spgetdesc: Node 1 (cl88n01.ibm.com) is a Power3_SMP_Wide. 
spgetdesc: Node 5 (cl88n05.ibm.com) is a 332_MHz_SMP_Thin. 
spgetdesc: Node 9 (cl88n09.ibm.com) is a 332_MHz_SMP_Thin. 
spgetdesc: Node 13 (cl88nl3.ibm.com) is a Power3_SMP_Wide. 
spgetdesc: Node 17 (cl87-S70.ibm.com) is a 7017-S70. 


Figure 91. spgetdesc -u -a output 


5.6 Attachment scenarios 

The following sections describe the different attachment scenarios of the 
SP-Attached server to the SP system, but they do not show all the cable 
attachments between the SP frame and the SP-attach server. 

Scenario 1: SP-Attached server to a one-frame SP system 

This scenario shows a single frame system with 14 thin nodes located in slots 
one through 14. The system has two unused node slots in position 15 and 16. 
These two empty node slots have corresponding switch ports that provide 
valid connections for the RS/6000 SP attachment adapter. 
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Figure 92. Scenario 1: SP-Attached server and one SP frame 

Scenario 2: SP-Attached server to a two-frame SP system 

This scenario shows a two-frame system with four high nodes in each frame. 
This configuration will use eight switch ports and leave eight valid switch 
ports available for future scalability. Therefore, it is important that the frame 
number assigned to the S70 must allow for extra non-switched frames (in this 
example, frames three and four), as the S70 frame must be attached to the 
end of the configuration. On this basis, the S70 frame number must be at the 
very least five to allow for the two possible non-switch of frames. 
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Figure 93. Scenario 2: SP-Attached server to two SP frames 

Note that the switch cable from frame one connects to the S70; for example, 

in this case, slot one frame five connects to switch port three of switch chip 

five. 

Scenario 3: One SP frame and multiple SP-attached servers 

This scenario illustrates three important considerations: 

1. The minimum requirement of one node in a frame to be able to attach one 
or more SP-Attached servers to an SP system as the SP-Attached server 
cannot be the first frame in an SP environment. 

2. It cannot interfere with the frame numbering of the expansion frames and, 
therefore, the SP-Attached server is always at the end of the chain. 

3. A switch port number must be allocated to each SP-Attached server even 
though the SP system is switchless. 

In this example, the first frame has a single thin node only, which is 

mandatory for any number of SP-Attached servers. 
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Figure 94. Scenario 3: SP frame and multiple SP-Attached servers 

Scenario 4: Non-contiguous SP-attached server configuration 

Frame one and three of the SP system are switch-configured. Frame two is a 
non-switched expansion frame attached to frame one. In this configuration, 
the SP-Attached server could be given frame number four, but that would 
forbid any future attachment of non-switched expansion frames to frame one’s 
switch. If, however, you assigned the SP-Attached server frame number 15, 
your system could still be scaled using other switch-configured frames and 
non-switched expansion frames. 

Frame three is another switch-configured frame, and the SP-Attached server 
has previously been assigned frame number 10 for future scalability 
purposes. 
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Figure 95. Scenario 4: Non-contiguous SP-Attached server 

For more information see: RS/6000: Planning Volume 2, GA22-7281. 


5.7 Related documentation 

These documents will help you to understand the concepts and examples 
covered in this chapter in order to maximize your chances of success in the 
exam. 

SP Manuals 

Chapter 15 "SP-Attached Servers" in RS/6000: Planning Volume 2, 
GA22-7281, provides some additional information regarding SP-Attached 
servers. 

SP Redbooks 

Chapter 4 "SP-Attached Server Support" in PSSP 3.1 Announcement, 
SG24-5332, provides some additional information on this topic. 


5.8 Sample questions 

This section provides a series of questions to help aid you in preparation for 
the certification exam. The answers to these questions can be found in 
Appendix A. 
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1. There must be three connections between the control workstation and any 
SP-Attached server. These are: 

A. A serial (RS-232), an Ethernet and an SP Switch connection. 

B. A serial (RS-232), an Ethernet and a ground connection. 

C. Two serial (RS-232) and a ground connection. 

D. Two serial (RS-232) and an Ethernet connection. 

2. An SP-Attached server is considered a node and also a frame. Which of 
the following statements are false? 

A. The node number for the SP-Attached server is calculated based on 
the frame number. 

B. The frame number assigned to an SP-Attached server cannot be 1. 

C. The SP-Attached server cannot be installed between two switched 
frames. 

D. The SP-Attached server cannot be installed between a switched frame 
and its expansion frames. 

3. The SP-Attached servers are considered standard nodes. However, there 
are some minor restrictions regarding system management. Which of the 
following statements are true? 

A. The SP-Attached server does not have a frame or node supervisor 
card, which restrict the console access to a single session. 

B. The SP-Attached server does not have a frame or node supervisor 
card, which limits the full hardware support, control, and monitoring 
capabilities of the server from the control workstation. 

C. The control workstation should have enough spare serial ports to 
connect the SP-Attached server. Additional 16-port adapter may be 
required in order to provide the extra serial ports. 

D. The SP-Attached server does not have a frame or node supervisor 
card, which restrict installation of SP-Attached servers to one at the 
time. 

4. The s70d daemon runs on the control workstation and communicates with 
the SP-Attached server for hardware control and monitoring. Which of the 
following statements are false? 

A. The s70d is partition-sensitive; so, it will be one s70d daemon per 
SP-Attached server per partition running on the control workstation. 

B. The s70d daemon is started and controlled by the hardmon daemon. 
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C. The s70d daemon uses SAMI protocol to connect to the SP-Attached 
server’s front panel. 

D. One s70d daemon per SP-Attached server runs on the control 
workstation. 

5. When connecting the SP-Attached server to the SP Frame, which of the 
following statements is NOT true? 

A. The tall frame with the eight port switch is not allowed. 

B. The SP system must be a tall frame as the 49 inch short LowBoy 
frames are not supported for the SP-attachment. 

C. Maximum of eight SP-Attached servers are supported in one SP 
system. 

D. The SP-Attached server can be the first frame in the SP system 

6. Which of the following provides a mechanism to control and monitor 
SP-Attached servers? 

A. SAMI 

B. Hardmon 

C. MACN 

D. MAC address 

7. Which of the following is the minimum PSSP and AIX requirements for an 
SP-Attached server? 

A. PSSP 2.4 and AIX 4.2.1 

B. PSSP 2.4 and AIX 4.3.2 

C. PSSP 3.1 and AIX 4.3.2 

D. PSSP 3.1 and AIX 4.2.1 

8. Once the frame number has been assigned, the server’s node numbers, 
which are based on the frame number, are automatically generated. Which 
of the following system defaults used is true? 

A. The SP-Attached server occupies the slot one position. 

B. The SP-Attached server is viewed as a single frame containing many 
nodes. 

C. The SP-Attached server occupies the 17th slot position. 

D. The SP-Attached server is viewed as a multiple frame system. 

9. For the S70 server, Which of the following adapters is supported for 
SP-LAN communications? 
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A. 100 Mbps AUI Ethernet 

B. 10 Mbps BNC 

C. 100 Mbps BNC 

D. ATM 155 TURBOWAYS UTP 

10.After configuring an SP-Attached server into your SP system, which 
system management command displays information about the frames 
already installed? 

A. splstframe 

B. spchvgobj 

C. spbootins 

D. splstdata 


5.9 Exercises 

Here are some exercises you may wish to perform: 

1. Describe the necessary steps to change the switch port number of an 
SP-Attached server. 

2. What are the necessary steps to add an existing SP-Attached server and 
preserve its current software environment? 

3. Familiarize yourself with the various SP-Attached server scenarios in 
section 5.6, “Attachment Scenarios” on page 176. 


Chapter 5. SP-Attached server support 1 83 



184 IBM Certification Study Guide RS/6000 SP 



Chapter 6. SP security 

This chapter covers security facilities available to the SP system 
environment. 

Three common security concepts on systems are defined in terms of 
identification, authentication, and authorization. 

Emphasis is placed on Kerberos, which is a security service included in 
PSSP. A definition on Kerberos, how Kerberos authentication service works, 
how to set up Kerberos, and how to manage Kerberos are discussed. 

Two other Kerberos-based security systems that may be used on the SP 
system are also discussed. These are AFS authentication and sysctl 
authorization services on the SP. 


6.1 Key concepts you should study 

The key concepts of security on the SP system are listed below in order of 
importance: 

• Concepts of using Kerberos for authentication services on the SP system. 
These include client/server activities, principals, realms, and tickets. 

• Procedures in managing Kerberos that covers adding and deleting 
principals and authentication administrators. 

• Concepts in AFS authentication management and its usage of a different 
set of protocols, utilities, daemons, and interfaces to manage the principal 
database. 

• Concepts of sysctl as an SP Kerberos-based client/server system that 
runs commands remotely and in a parallel fashion. 

• Procedures of sysctl authorization. 

• Understanding how Kerberos provides better security services than 
standard AiX security. 

Recommended reading can be found in 6.15, “Related documentation” on 
page 214. 


© Copyright IBM Corp. 2000 
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6.2 Security-related concepts 

There are three security-related concepts. The following is a brief description 
of the three concepts and how they may be applied to the SP system 
environment. 

1. Identification: This is a process by which one entity tells another who it 
is, that is, its identity. In the SP system environment, identification 
simply means a process that presents client identity credentials. 

2. Authentication: This is a process by which one entity verifies the 
identity of another. Identities and credentials are checked, but it does 
not add or restrict functions. In the SP system environment, 
authentication simply means a service requester’s name and encrypted 
password are checked with the usage of available system utilities. 

3. Authorization: This process involves defining the functions that a user 
or process is permitted to perform. In the SP system environment, 
authorization simply means a service requester is granted permission 
to do a specific action, for example, execute commands remotely. 

In a system environment, the server first identifies and authenticates the 
client and then checks its authorization for the function requested. 

In an SP system, there are at least two levels of security: AIX and PSSP. 

Kerberos, which comes in bundled with PSSP, has been entrusted to perform 
authentication on SP environments. 


6.3 AIX security 

AIX provides the basic security elements to control user access to files, 
directories, and networks. Details of AIX security can be obtained in the 
redbook named Elements of Security: AIX 4.1, GG24-4433. 

However, a machine may be programmed to send information across the 
network impersonating another machine (which means assuming the identity 
of another machine or another user). One way to protect the machines and 
users from being impersonated is to authenticate the packets when travelling 
within the network. Kerberos, which is included in PSSP, can provide such 
authentication services to the SP system environment. 
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6.3.1 Secure remote execution commands 


In AIX 4.3.1, the commands telnet and ftp, as well as the r-commands rep, 
riogin, and rsh have been enhanced to support multiple authentication 
methods (note that rexec is not included in this list). In earlier releases, the 
standard AIX methods were used for authentication and authorization: 

telnet The telnet client establishes a connection to the server, which 

then presents the login screen of the remote machine typically by 
asking for userid and password. These are transferred over the 
network and are checked by the server’s login command. This 
process normally performs both authentication and authorization. 

ftp Again, userid and password are requested. Alternatively, the login 

information for the remote system (the server) can be provided in 
a $HOME/.netrc file on the local machine (the client), which is 
then read by the ftp client rather than querying the user. This 
method is discouraged since plain text passwords should not be 
stored in the (potentially remote) file system. 

rexec Same as ftp. As mentioned above, use of $HOME/.netrc files is 
discouraged. 

The main security concern with this authentication for the above commands 
is the fact that passwords are sent in plain text over the network. They can be 
easily captured by any root user on a machine that is on the network(s) 
through which the connection is established. 

rep, riogin, and rsh 

The current user name (or a remote user name specified as a 
command line flag) is used, and the user is prompted for a 
password. Alternatively, a client can be authenticated by its IP 
name/address if it matches a list of trusted IP names/addresses 
that are stored in files on the server. 

• /etc/hosts.equiv lists the hosts from which incoming (client) connections 
are accepted. This works for all users except root (UID=0). 

• $HOME/.rhosts lists additional hosts, optionally restricted to specific 
userids, which are accepted for incoming connections. This is on a 
per-user basis and also works for the root user. 

Here, the primary security concern is host impersonation. It is relatively easy 
for an intruder to set up a machine with an IP name/address listed in one of 
these files and gain access to the system. Of course, if a password is 
requested rather than using $HOME/.rhosts or /etc/hosts.equiv files, this is 
also normally sent in plain text. 
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With AIX V4.3.1, all these commands, except rexec, also support Kerberos 
Version 5 authentication. The base AIX operating system does not include 
Kerberos. It is recommended that DCE for AIX Version 2.2 is used to provide 
Kerberos authentication. Note that previous versions of DCE did not make 
the Kerberos services available externally. However, DCE for AIX 
Version 2.2, which is based on OSF DCE Version 1.2.2, provides the 
complete Kerberos functionality as specified in RFC 1510, The Kerberos 
Network Authentication Service (V5). 

For backward compatibility with PSSP 3.1 (which still requires Kerberos 
Version 4 for its own commands), the AIX r-commands, rep and rsh, also 
support Kerberos Version 4 authentication. See 6.5, “How Kerberos works” 
on page 191 for details on Kerberos. 

Authentication methods for a machine are selected by the AIX chauthent 
command and can be listed with the isauthent command. These commands 
call the library routines set_auth_method() and get_auth_method(), which are 
contained in a new library, libauthm.a. Three options are available: chauthent 
-std enables standard AIX authentication; chauthent -ks and chauthent -k4 
enable Version 5 or 4 Kerberos authentication. More than one method can be 
specified, and authenticated applications/commands will use them in the 
order specified by chauthent until one is successful (or the last available 
method fails, in which case, access is denied). If standard AIX authentication 
is specified, it must always be the last method. 

- Note - 

On the SP, the chauthent command should not be used directly. The 
authentication methods for SP nodes and the control workstation are 
controlled by the partition-based PSSP commands chauthpar and 
isauthpar. Configuration information is stored in the Syspar SDR class in 
the auth_install, auth_root_rcmd and auth_methods attributes. 


If Kerberos Version 5 is activated as an authentication method, the telnet 
connection is secured by an optional part of the telnet protocol specified in 
RFC 1416, Telnet Authentication Option. Through this mechanism, clients 
and servers can negotiate the authentication method. The ftp command uses 
another mechanism: Here the authentication between client and server takes 
place through a protocol specified in RFC 1508, Generic Security Service 
API. These extensions are useful but have no direct relation to SP system 
administration. An impediment to widespread use of these facilities is that 
they rely on all clients being known to the Kerberos database, including the 
clients’ secret passwords. 
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The kerberized rsh and rep commands are of particular importance for the SP 
as they replace the corresponding Kerberos Version 4 authenticated 
r-commands which have been part of PSSP Versions 1 and 2. Only the PSSP 
versions of rsh, rep, and kshd have been removed from PSSP V3.1. It still 
includes and uses the Kerberos Version 4 server. This Kerberos server can 
be used to authenticate the AIX r-commands. A full description of the 
operation of the rsh command in the SP environment can be found in 6.12.2, 
“Remote execution commands” on page 204, including all three possible 
authentication methods. 


6.4 Defining Kerberos 

Kerberos can be used to prevent machine impersonation by means of 
authenticating the packets in a two-party communication. 

Kerberos is a service for authenticating users in a network environment. It 
consists of a set of distributed software with encrypted exchanges of 
information to allow a user access to servers. It also provides for 
cryptographic checks to make sure that data passing between workstations 
and servers is not corrupted either by accident or by tampering. 

6.4.1 AFS and Sysctl are Kerberos-based security systems 

Both AFS and sysctl are Kerberos-based security system that may be used 
on the SP system. 

AFS is a distributed file system. Since AFS includes Kerberos Version 4, SP 
systems may use an AFS Kerberos authentication server instead of SP 
servers. However, AFS uses a different set of protocols, utilities, daemons, 
and interfaces for principal database administration. 

Sysctl is an SP Kerberos-based client/server system designed to run 
commands remotely and in a parallel fashion with a high degree of 
authentication. 

Further descriptions of the AFS and Sysctl security systems are included in 
the later parts of this chapter. 

6.4.2 Main reasons for using Kerberos on the SP 

The main reasons for using Kerberos are: 

• To prevent unauthorized access to the system. 

• Prevents non-encrypted passwords from being passed on the network. 
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• Provides security on remote commands, such as rsh, rep, dsh, and syscti. 
Description of these commands are in the following table. 


Table 9. Some Kerberos authenticated commands 


Commands 

Description 

spmon 

Controls and monitors SP system activity through the hardware 
monitor, hardmon. Replaced by Perspectives on PSSP V3.1. 

rsh 

rsh is the remote shell command. On PSSP V3.1 this command is 
no longer SP provided. The /usr/lpp/ssp/rcmd/bin/rsh 
command is linked to the Berkeley command, /usr/bin/rsh (which 
uses .rhosts file). 

rep 

rep is remote copying of files between local and remote hosts. On 
PSSP V3.1 , this command is no longer SP provided. The 
/usr/lpp/ssp/rcmd/bin/rcp command is now linked to the 

Berkeley command, /usr/bin/rep. 

dsh 

Can be issued to groups of SP nodes at the same time. For 
example, dsh -w sp3n05 sp3n06. 

dsh is not interactive. Therefore, telnet, rep, rsh, and so forth, may 
be used. 

syscti 

Uses the SP authentication service. When the client issues the 
syscti command, a Kerberos ticket will be sent to the server to 
validate the identity of the client. 


6.4.3 Kerberos terms 

The following table consists of basic Kerberos Terms. 


Table 10. Basic Kerberos terms 


Basic Kerberos Terms 

Description 

Principal 

A Kerberos user or Kerberos ID. That is, a user who 
requires protected service. 

Instance 

The authority granted to the Kerberos user. 

Example for usage with a user: In root.admin, root is 
the principal, and admin is the instance which 
represents Kerberos authorization for administrative 
tasks. 

Example for usage with a service: In hardmon.sp3en0, 
hardmon represents the hardware monitor service, 
and sp3en0 represents the machine providing the 
service 
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Basic Kerberos Terms 

Description 

Realm 

A collection of systems managed as a single unit. The 
default name of the realm on an SP system is the 
TCP/IP domain name converted to upper case. If DNS 
is not used, then the CWS hostname is converted to 
uppercase. 

Authentication Server 
(Primary and secondary) 

Host with the Kerberos database. This host provides 
the tickets to the principals to use. 

When running the setup_authent, program 
authentication services are initialized. At this stage, a 
primary authentication server must be nominated (this 
may be the CWS). A secondary authentication server 
may then be created later that serves as a backup 
server. 

Ticket 

An encrypted packet required for use of a Kerberos 
service. The ticket consists of the identity of the user. 
Tickets are by default stored in the /tmp/tkt<client’s 
user ID> file. 

Ticket-Granting Ticket (TGT) 

Initial ticket given to the Kerberos principal. The 
authentication server site uses it to authenticate the 
Kerberos principal. 

Service Ticket 

Secondary ticket that allows access to certain server 
services, such as rsh and rep. 

Ticket Cache File 

File that contains the Kerberos tickets for a particular 
Kerberos principal and AIX ID. 

Service Keys 

Used by the server sites to unlock encrypted tickets in 
order to verify the Kerberos principal. 


6.5 How Kerberos works 

Kerberos authenticates information exchanged over the network. There are 
three daemons that deal with the Kerberos services. 6.5.2, “Kerberos 
authentication process” on page 192 illustrates the Kerberos authentication 
process. 

6.5.1 Kerberos daemons 

The three Kerberos daemons are as follows: 

kerberos: This daemon only runs on the primary and secondary 

authentication servers. It handles getting ticket-granting and 
service tickets for the authentication clients. There may be more 
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than one kerberos daemon running on the realm to provide faster 
service, especially when there are many client requests. 

kadmind: This daemon only runs on the primary authentication server 
(usually the CWS). It is responsible for serving the Kerberos 
administrative tools, such as changing passwords and adding 
principals. It also manages the primary authentication database. 

kpropd: This daemon only runs on secondary authentication database 

servers. When the daemon receives a request, it synchronizes the 
Kerberos secondary server database. The databases are 
maintained by the kpropd daemon, which receives the database 
content in encrypted form from a program, and kprop, which runs 
on the primary server. 

6.5.2 Kerberos authentication process 

Three entities are involved in the Kerberos authentication process: The client, 

the server, and the authentication database server. The following is an 

example of authentication: 

1. The client (Host A) issues the kinit command that requests for a 
ticket-granting ticket (TGT) to perform the rep command on the destination 
host (Host B). 

For example: Issue command lines kinit root.admin and rep sp3eno : fiie 

2. The authentication database server (Host C) that is the Key Distribution 
Center (KDC) performs authentication tasks. If information is valid, then it 
will issue a service ticket to the client (Host A). 

3. The client (Host A) then sends the authentication and service ticket to the 
server (Host B). 

4. The kshd daemon on the server (Host B) receives the request and 
authenticates it using one of the service keys. It then authorizes a 
Kerberos principal through the .klogin file to perform the task. The results 
of the rep command will then be sent to the client (Host A). 


6.6 Kerberos paths, directories, and files 

The location of Kerberos directories and files are in the following paths. 

For PSSP 2.4: 

PATH=/usr/lpp/ssp/rcmd/bin:$PATH:/usr/lpp/ssp/bin:/usr/lpp/ssp/kerberos/bi 
n:/usr/lpp/ssp/kerberos/etc 
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For PSSP V3.1: 


PATH=$ PATH:/usr/lpp/ssp/bin:/usr/lpp/ssp/kerberos/bin:/usr/lpp/ssp/kerbero 
s/etc 

MANPATH=$MANPATH:/usr/lpp/ssp/man:/etc 

Table 11 displays the Kerberos directories and files on the Primary 
Authentication Server, which is usually the control workstation (CWS). 


Table 11. Kerberos directories and files on Primary Authentication Server 


Directories and Files 

Description 

/.k 

The master key cache file. Contains the DES key derived 
from the master password. The DES key is saved in /.k file 
using the /usr/lpp/ssp/kerberos/etc/kstash 
command. The kadmind daemon reads the master key 
from this file instead of prompting for the master 
password. 

After changing the master password, perform the 
following: Enter the kstash command to kill and restart 
kadmind daemon and to recreate the /.k file to store the 
new master key in it. 

$HOME/.klogin 

Contains a list of principals. For example, 

name.instance@ realm. Listed principals are authorized to 

invoke processes as the owner of this file. 

/tmp/tkt<uid> 

Contains of the tickets owned by a client (user). The first 
ticket in the file is the TGT. 

The kinit command creates this file. 

The klist command displays the contents of the current 
cache file. 

The kdestroy command deletes the current cache file. 

/etc/krb-srvtab 

Contains the names and private keys of the local 
instances of Kerberos protected services. Every node and 
CWS, contains an /etc/krb-srvtab file that contains the 
keys for the services provided on that host. On the CWS 
the hardmon and rcmd service principals are in the file. 
They are used for SP system management and 
administration. 

/etc/krb.conf 

The first line contains the name of the local authentication 
realm. Subsequent lines specify the authentication server 
for a realm. For example, 

MSC.ITSO.IBM.COM 

MSC.ITSO.IBM.COM sp3en0.msc.itso.ibm.com admin 
server 
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Directories and Files 

Description 

/etc/krb.realms 

Maps a host name to an authentication realm for the 
services provided by that host. Example of forms: 
host_name realm_name 
domain_name realm_name 

These are created by the setup_authent script on the 
primary authentication server. 

/var/kerberos/database/* 

This directory includes the authentication database 
created by setup_authent. Files residing in this directory 
include principal.pag and principal.dir; and access control 
lists for kadmin that are admin_acl.add, admin_acl.mod, 
and admin_acl.get. 

/var/adm/SPIogs/kerberos 

/kerberos.log 

This file records the kerberos daemon’s process IDs and 
messages from activities. 


Kerberos directories and files on the nodes are: 
$HOME/.klogin 
/etc/krb-srvtab 
/etc/krb.conf 
/etc/krb.realms 
/tmp/tkt<uid> 


6.7 Authentication services procedures 

This section gives an overview of required procedures to perform Kerberos 
authentication services. 

1. Set up user accounts so that Kerberos credential can be obtained 
whenever a user logs in. 

• Add the name of the program that will execute the kinit command for 
the users in the /etc/security/login.cfg file. For example: 

program=/usr/lpp/ssp/Kerberos/bin/k4init <program name> 

• Update the authl or auth2 attribute in the /etc/security/user file for each 
user account. For example: authl=SYSTEM,Kerberos,-root.admin 

2. Perform login to SP Kerberos authentication services. 

• Use the k4init <principai> command to obtain a ticket-granting ticket. 
For example, enter k4init root.admin 

• Enter the password. 
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3. Display the authentication information. 

• Enter the command: k4iist 

4. Delete Kerberos tickets. 

• Enter the command: k4destroy 

• Verify that the tickets have been destroyed by entering the command 
k4iist again. 


6.8 Kerberos passwords and master key 

The following describes initial setup of passwords on the primary 
authenticator server: 

During the installation stage, the setup_authent command is entered to 
configure the SP authentication services on the control workstation (CWS) 
and other RS/6000 workstations connected to the SP system. The 
setup_authent command gives an interactive dialog that prompts for two 
password entries: 

• Master password (then the encrypted Kerberos master key will be 
written in the /.k file) 

• Administrative principal’s password 
Change a principal’s password: 

Enter the kpasswd command to change a Kerberos principal’s password. 
For example, to change the password of current user, use the kpasswd 
command. 

Change Kerberos master password: 

1. Log in to Kerberos as the initial admin principal and enter the 
Command: k4init root.admin 

2. Change the password by entering the following command lines. The 
kdb_utii command is used here to change the master key: 

kdbutil new_master_key /var/kerberos/database/newdb.$$ 
kdb_util load /var/kerberos/database/newdb.$$ 

3. Replace the /.k file by entering the kstash command. This will store the 
new master key in the /.k file. 

4. Kill and respawn the server daemons by entering the following 
command lines: 

stopsrc -s kerberos 
startsrc -s kerberos 
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stopsrc -s kadmind 
startsrc -s kadmind 


6.9 Kerberos principals 

Kerberos principals are either users who use authentication services to run 
the Kerberos-authenticated applications supplied with the SP system or the 
individual instances of the servers that run on SP nodes, the control 
workstation, and on other IBM RS/6000 workstations that have network 
connections to the SP system. 

• user principals for SP system management 

An implementation of the SP system must have at least one user principal 
defined. This user is the authentication database administrator who must 
be defined first so that other principals can be added later. 

When AFS authentication servers are being used, the AFS administrator 
ID already exists when the SP authentication services are initialized. 
When PSSP authentication servers are being used, one of the steps 
included in setting up the authentication services is the creation of a 
principal whose identifier includes the admin instance. It is suggested, but 
not essential, that the first authentication administrator also be the root 
user. 

Various installation tasks performed by root, or other users with UID 0, 
require the Kerberos authority to add service principals to the 
authentication database. 

• Service principals used by PSSP components: 

Two service names are used by the Kerberos-authenticated applications 
in an SP system: 

1. hardmon used by the System Monitor daemon on the control 
workstation by logging daemons. 

2. rcmd used by sysctl. 

The hardmon daemon runs only on the control workstation. The SP 
logging daemon, splogd, can run on other IBM RS/6000 workstations. 
Therefore, for each (short) network interface name on these workstations, 
a service principal is created with the name hardmon and the network 
name as the instance. The remote commands can be run from, or to, any 
IBM RS/6000 host on which the SP system authenticated client services 
(ssp.clients) are installed. Therefore, for each (short) network interface 
name on all SP nodes, the control workstation, and other client systems, a 
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service principal is created with the name rcmd and the network name as 
the instance. 

6.9.1 Add a Kerberos principal 

It is desirable to allow users to perform certain system tasks. Such users 

must be set up as Kerberos principals. These users may include the 

following: 

• Operators who use the spmon command to monitor system activities. 

• Users who require extra security on the Print Management System when 
using it in open mode. 

• System users who require partial root access. They may use the syscti 
command to perform this. However, they must be set up as a Kerberos 
principal as well. 

There are different ways to add Kerberos principals. 

1. Use the kadmin command and its subcommand add_new_key (ank for short). 
This will always prompt for your administrative password. 

2. Use the kdb_edit command. It allows the root user to enter this command 
without specifying the master key. 

3. Use the add^principai command to allow a large number of principals to 
be added at one time. 

4. Use the mkkp command to create a principal. This command is 
non-interactive and does not provide the capability to set the principal’s 
initial password. The password must, therefore, be set by using the 
kadmin command and its subcommand cpw. 

5. Add an Authentication Administrator. 

• Add a principal with an admin instance by using the kadmin command 
and its subcommand add_new_key (ank for short). For example: 

kadmin 

admin: add_new_key spuserl.admin 

• Add the principal identifier manually to one or more of the ACL files: 
/var/kerberos/database/admin_acl.add 
/var/kerberos/database/admin_acl.get 
/var/kerberos/database/admin_acl.mod 
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6.9.2 Change the attributes of the Kerberos principal 

To change a password for a principal in the authentication database, a PSSP 
authentication database administrator can use either the kpasswd command or 
the kadmin program's changejassword subcommand. You can issue these 
commands from any system running SP authentication services and that do 
not require a prior k4init. 

To use the kpasswd command: 

1. Enter the kpasswd command with the name of the principal whose 
password is being changed: 

kpasswd -n name 

2. At the prompt, enter the old password. 

3. At the prompt, enter the new password. 

4. At the prompt, reenter the new password. 

To use the kadmin program: 

1. Enter the kadmin command: 

kadmin 

A welcome message and an explanation of how to ask for help are 
displayed. 

2. Enter the change^password or cpw subcommand with the name of the 
principal whose password is being changed: 

cpw name 

The only required argument for the subcommand is the principal's name. 

3. At the prompt, enter your admin password. 

4. At the prompt, enter the principal's new password. 

5. At the prompt, reenter the principal's new password. 

To change your own admin instance password, you can use either the 
kpasswd command or the kadmin program's change_adminjoassword 
subcommand. 

To use the kpasswd command: 

1. Enter the kpasswd command with your admin instance name: 

kpasswd -n name.admin 

2. At the prompt, enter your old admin password. 
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3. At the prompt, enter your new admin password. 

4. At the prompt, reenter your new admin password. 

To use the kadmin program: 

1. Enter the kadmin command: 

kadmin 

A welcome message and explanation of how to ask for help are displayed. 

2. Enter the change_adminj>assword or cap subcommand: 

cap 

3. At the prompt, enter your old admin password. 

4. At the prompt, enter your new admin password. 

5. At the prompt, reenter your new admin password. 

In addition to changing the password, you may want to change either the 
expiration date of the principal or its maximum ticket lifetime, though these 
are not so likely to be necessary. To do so, the root user on the primary 
authentication database system must use the kdb edit command just as 
when adding new principals locally. The command finds if it already exists 
and prompts for changes to all its attributes starting with the password 
followed by the expiration date and the maximum ticket lifetime. 

Use the chkp command to change the maximum ticket lifetime and expiration 
date for Kerberos principals in the authentication database. When logged into 
a system that is a Kerberos authentication server, the root user can run the 
chkp command directly. Additionally, any users who are Kerberos database 
administrators listed in the /var/kerberos/database/admin_acl.mod file can 
invoke this command remotely through a sysctl procedure of the same name. 

The administrator does not need to be logged in on the server host to run chkp 
through sysctl but must have a Kerberos ticket for that admin principal 
(name.admin). 

6.9.3 Delete Kerberos principals 

There are two ways to delete principals. One is through the rmkp command, 
and another one is through the kdb_utii command. 

The following are the procedures to delete a principal through the kdb_utii 
command and its subcommands. 
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1. The root user on the primary authentication server must edit a backup 
copy of the database and then reload it with the changed database. For 
example, in order to keep a copy of the primary authentication database to 
a file named slavesave in the /var/kerberos/database directory, enter the 
command: kdb_util dump /var/kerberos/database/slavesave 

2. Edit the file by removing the lines for any unwanted principals. 

3. Reload the database from the backup file by entering the command: 

kdb_util load /var/kerberos/database/slavesave 


6.10 Server key 

The server keys are located in the /etc/krb-srvtab file on the control 
workstation (CWS) and all the nodes. The file is used to unlock (decrypt) 
tickets coming in from clients authentication. 

• On the CWS, hardmon and rcmd service principals are in the file. 

• On the nodes, rcmd service principals are in the file. 

• The local server key files are created on the CWS by setup_authent during 
installation when authentication is first set up. 

• setup_server script creates server key files for nodes and stores them in 
/tftpboot directory for network booting. 

• Service Key information may be changed by using the command: 

ksrvutil change 

• Service key information may be displayed by one of the following 
command lines. They will display information, such as the key version 
number, the service and its instance, and the realm name in some form. 

To view local key file /etc/krb-srvtab, use: 

ksrvutil list 
k41ist -srvtab 

krsvutil list -f /tftpboot/sp31nl-new-srvtab 


6.10.1 Change a server key 

A security administrator will decide how frequently service keys need to be 
changed. 

The ksrvutil command is used to change service keys. 
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6.11 Using additional Kerberos servers 

Secondary Kerberos authentication servers can improve security by providing 
backup to the primary authentication server and network load balancing. The 
kerberos and kprod daemons run on the secondary servers. 

The tasks related to the Kerberos secondary servers are: 

• Setting up a secondary Kerberos server. 

• Managing the Kerberos secondary server database. 

6.11.1 Set up and initialize a secondary Kerberos server 

The following example provides the procedures to set up and initialize a 
secondary authentication server. 

1. Add a line to the /etc/krb.conf file listing this host as a secondary server on 
the primary server. 

2. Copy the /etc/krb.conf file from the primary authentication server. 

3. Copy the /etc/krb.realms file from the primary server to the secondary 
server. 

4. Run the setup_authent program following the prompt for a secondary 
server. (Note: It will also prompt you to log in as the same administrative 
principal name as defined when the primary server was set up.) The 
remainder of the initialization of authentication services on this secondary 
system takes place automatically. 

5. After setup_authent completes, add an entry for the secondary 
authentication server to the /etc/krb.conf file on all SP nodes on which you 
have already initialized authentication. 

6. If this is the first secondary authentication server, you should create a root 
crontab entry on the primary authentication server that invokes the script 
/usr/kerberos/etc/push-kprop that consists of the kprop command. This 
periodically propagates database changes from the primary to the 
secondary authentication server. Whenever the Kerberos database is 
changed, the kprop command may also be run to synchronize the 
Kerberos database contents. 

6.11.2 Managing the Kerberos secondary server database 

Both the kerberos and kpropd daemons run on the secondary authentication 
server and must be active all the time. 
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The kpropd daemon, which always runs on the secondary server, 
automatically performs updates on the secondary server database. 

The kpropd daemon is activated when the secondary server boots up. If the 
kprod daemon becomes inactive, it may be automatically reactivated by the 
AIX System Resource Controller (SRC). That is, it may be restarted by using 
the startsrc command. The history of restarting the daemon is kept in the log 
file called /var/adm/SPIogs/kerberos/kprod.log. 


6.12 SP services that utilize Kerberos 

On the SP, there are three different sets of services that use Kerberos 
authentication: The hardware control subsystem, the remote execution 
commands, and the sysctl facility. This section describes the authentication 
of these services and the different means they use to authorize clients that 
have been successfully authenticated. 

6.12.1 Hardware control subsystem 

The SP hardware control subsystem is implemented through the hardmon 
and splogd daemons, which run on the control workstation and interface with 
the SP hardware through the serial lines. To secure access to the hardware, 
Kerberos authentication is used, and authorization is controlled through 
hardmon-specific Access Control Lists (ACLs). PSSP V3.1 and earlier 
releases only support Kerberos Version 4 not Version 5 authentication. 

The following commands are the primary clients to the hardware control 
subsystem: 

• hnnmon: Monitors the hardware state. 

• bmcmds: Changes the hardware state. 

• siterm: Provides access to the node’s console. 

• nodecond: For network booting, uses hnnmon, bmcmds, and siterm. 

• spmon: some parameters are used to monitor; some are used to change the 
hardware state. The spmon -open command opens a siterm connection. 

Other commands, like sphardware from the SP Perspectives, communicate 
directly with an internal hardmon API that is also Kerberos Version 4 
authenticated. 

To Kerberos, the hardware control subsystem is a service represented by the 
principal name hardmon. PSSP sets up one instance of that principal for each 
network interface of the control workstation, including IP aliases in case of 
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multiple partitions. The secret keys of these hardmon principals are stored in 
the /etc/krb-srvtab file on the control workstation. The k4iist -srvtab 
command shows the existence of these service keys. 


f # k41ist -srvtab 

1 

Server key file 

: /etc/krb-srvtab 

Service 

Instance 

Realm Key Version 

hardmon 

sp4cw0 

MSC. ITSO. IBM. COM 1 

rcmd 

sp4cw0 

MSC. ITSO. IBM. COM 1 

hardmon 

sp4en0 

MSC. ITSO. IBM. COM 1 

rcmd 

— 

sp4en0 

MSC.ITSO.IBM.COM 1 

J 


The above client commands performs a Kerberos Version 4 authentication. 
They require that the user who invokes them has signed on to Kerberos by 
the k4init command and passes the user’s Ticket-Granting Ticket to the 
Kerberos server to acquire a service ticket for the hardmon service. This 
service ticket is then presented to the hardmon daemon, which decrypts it 
using its secret key stored in the /etc/krb-srvtab file. 

Authorization to use the hardware control subsystem is controlled through 
entries in the /spdata/sysl/spmon/hmacls file, which is read by hardmon 
when it starts up. Since hardmon runs only on the control workstation, this 
authorization file only exists on the control workstation. 



Each line in the file lists an object, a Kerberos principal, and the associated 
permissions. Objects can either be host names or frame numbers. By default, 
PSSP creates entries for the control workstation and for each frame in the 
system, and the only principals that are authorized are root.admin and the 
instance of hardmon for the SP Ethernet adapter. There are four different 
sets of permissions indicated by a single lowercase letter: 

• m (Monitor) - Monitor hardware status 

• v (Virtual Front Operator Panel) - Control/change hardware status 

• s (SI) - Access to node’s console through the serial port (siterm) 
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• a (Administrative) - use hardmon administrative commands 

Note, that for the control workstation, only administrative rights are granted. 
For frames, the monitor, control, and SI rights are granted. These default 
entries should never be changed. However, other principals might be added. 
For example, a site might want to grant operating personnel access to the 
monitoring facilities without giving them the ability to change the state of the 
hardware or access the nodes’ console. 

- Note: Refreshing hardmon - 

When the hmacls file is changed, the hmadm setacis command must be 
issued on the control workstation to notify the hardmon daemon of the 
change and cause it to reread that file. The principal that issues the hmadm 
command must have administrative rights in the original hmacls file; 
otherwise, the refresh will not take effect. However, hardmon can always 
be completely stopped and restarted by the root user. This will re-read the 
hmacls file. 


Care must been taken if any of the hardware monitoring/control commands 
are issued by users that are authenticated to Kerberos but do not have the 
required hardmon authorization. In some cases, an error message will be 
returned, for example: 

hmmon: 0026-614 You do not have authorization to access the Hardware Monitor. 

In other cases, no misleading error messages may be returned. This mostly 
happens when the principal is listed in the hmacls file but not with the 
authorization required by the command. 

In addition to the above commands, which are normally invoked by the 
system administrator, two SP daemons are also hardmon clients: The splogd 
daemon and the hmrmd daemon. These daemons use two separate ticket 
cache files: /tmp/tkt_splogd and /tmp/tkt_hmrmd. Both contain tickets for the 
hardmon principal, which can be used to communicate with the hardmon 
daemon without the need to type in passwords. 

6.12.2 Remote execution commands 

In releases prior to PSSP V3.1, PSSP provided its own remote execution 
commands and the corresponding krshd daemon that were Kerberos 
Version 4 authenticated. These were located in /usr/lpp/ssp/rcmd/. All the SP 
management commands used the PSSP version of rshand rep, and AIX 
provided the original r-commands in /usr/bin/. This is shown in Figure 96 on 
page 205. 
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/usr/sbin/rshd 


Figure 96. Remote shell structure before PSSP 3.1 

In PSSP V3.1, the authenticated r-commands in the base AIX 4.3.2 operating 
system are used instead. They can be configured for multiple authentication 
methods including the PSSP implementation of Kerberos Version 4. To allow 
applications that use the full PSSP paths to work properly, the PSSP 
commands rep and remsh/rsh have not been simply removed but have been 
replaced by links to the corresponding AIX commands. This new calling 
structure is shown in Figure 97 on page 206. 
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/usr/sbin/rshd 


Figure 97. Remote shell structure in PSSP 3.1 

For the user, this switchover from PSSP-provided authenticated r-commands 
to the AlX-provided authenticated r-commands should be transparent. In the 
remainder of this section, we look at some of the implementation details of 
the security integration of the r-commands focussing on the AIX rsh 
command and the corresponding rshd and krshd daemons. 

6.12.2.1 Control flow in the rsh command 

The full syntax of the AIX authenticated rsh command is: 

/usr/bin/rsh RemoteHost [-n] [-1 RemoteUser] \ 

[-f | —F] [-k Realm] [Command] 

Here, we assume that a command is present. When the rsh command is 
called, it issues the get_auth_method() system call, which returns the list of 
authentication methods that are enabled on the machine. It then attempts a 
remote shell connection using these methods, in the order they are returned, 
until one of the methods succeeds or all have failed. 
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- Note: K5MUTE - 

Authentication methods are set on a system level not on a user level. This 
means that, for example, on an SP where Kerberos Version 4 and 
Standard AIX is set, a user’s rsh command will produce a Kerberos 
authentication failure if that user has no Kerberos credentials (which is 
normally the case unless the user is an SP system administrator). After 
that failure, the rsh attempts to use the standard AIX methods. The delay 
caused by attempting both methods can not be prevented, but there is a 
means to suppress the error messages of failed authentication requests, 
which may confuse users: By setting the environment variable K5MUTE=1, 
these messages will be suppressed. Authorization failures will still be 
reported though. 


This is what happens for the three authentication methods: 

STD When the standard AIX authentication is to be used, rsh uses the 
rcmd() system call from the standard C library (libc.a). The shell 
port (normally 514/tcp) is used to establish a connection to the 
/usr/sbin/rshd daemon on the remote host. The name of the local 
user, the name of the remote user, and the command to be 
executed are sent. This is the normal BSD-style behavior. 

K5 For Kerberos Version 5 authentication, the kcmd() system call is 

issued (this call is not provided in any library). It acquires a service 
ticket for the /.:/host/<ip_name> service principal from the 
Kerberos Version 5 server over the Kerberos port (normally 88). It 
then uses the kshell port (normally 544/tcp) to establish a 
connection to the /usr/sbin/krshd daemon on the remote host. In 
addition to the information for STD authentication, kcmd() sends 
the Kerberos Version 5 service ticket for the rcmd service on the 
remote host for authentication. If the -f or -f flag of rsh is present, 
it also forwards the Ticket-Granting Ticket of the principal that 
invoked rsh to the krshd daemon. Note that Ticket-forwarding is 
possible with Kerberos Version 5 but not with Version 4. 

K4 Kerberos Version 4 authentication is provided by the PSSP 

software. The system call spk4rsh(), contained in Iibspk4rcmd.a in 
the ssp.client fileset, will be invoked by the AIX rsh command. It 
will acquire a service ticket for the rcmd.<ip_name> service 
principal from the Kerberos Version 4 server over the kerberos4 
port 750. Like kcmd(), the spk4rsh() subroutine uses the kshell 
port (normally 544/tcp) to connect to the /usr/sbin/krshd daemon 
on the remote host. It sends the STD information and the 
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Kerberos Version 4 rcmd service ticket but ignores the -f and -f 
flags since Version 4 Ticket-Granting Tickets are not forwardable. 

These requests are then processed by the rshd and krshd daemons. 

6.12.2.2 The standard rshd daemon 

The /usr/sbin/rshd daemon listening on the shell port (normally 514/tcp) of the 
target machine implements the standard, BSD-style rsh service. Details can 
be found in "rshd Daemon" in the AIX 4.3 Commands Reference Volume 5, 
SC23-4119. Notably, the rshd daemon: 

• Does some health checks, such as verifying that the request comes from a 
well-known port. 

• Verifies that the local user name (remote user name from the client’s view) 
exists in the user database and gets its UID, home directory, and login 
shell. 

• Performs a chdir() to the user’s home directory (terminates if this fails). 

• If the UID is not zero, rshd checks if the client host is listed in 
/etc/hosts.equiv. 

• If the previous check is negative, rshd checks if the client host is listed in 
$HOME/.rhosts. 

• If either of these checks succeeded, rshd executes the command under 
the user’s login shell. 

Be aware that the daemon itself does not call the get_auth_method() 
subroutine to check if STD is among the authentication methods. The 
chauthent command simply removes the shell service from the /etc/inetd.conf 
file when it is called without the -std option; so, inetd will refuse connections 
on the shell port. But if the shell service is enabled again by editing 
/etc/inetd.conf and refreshing inetd, the rshd daemon will honor requests 
even though isauthent still reports that Standard AIX authentication is 
disabled. 

6.12.2.3 The Kerberized krshd daemon 

The /usr/sbin/krshd daemon implements the kerberized remote shell service 
of AIX. It listens on the kshell port (normally 544/tcp) and processes the 
requests from both the kcmd() and spk4rsh() client calls. 

In contrast to rshd, the krshd daemon actually uses get_auth_methods() to 
check if Kerberos Version 4 or 5 is a valid authentication method. For 
example, if a request with a Kerberos Version 4 service ticket is received, but 
this authentication method is not configured, the daemon replies with: 
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krshd: Kerberos 4 Authentication Failed: This server is not configured 
to support Kerberos 4. 


After checking if the requested method is valid, the krshd daemon then 
processes the request. This, of course, depends on the protocol version. 

Handling Kerberos Version 5 requests 

To authenticate the user, krshd uses the Kerberos Version 5 secret key of the 
host/<ip_hostname> service and attempts to decrypt the service ticket sent 
by the client. If this succeeds, the client has authenticated itself. 

The daemon then calls the kvalid_user() subroutine, from libvaliduser.a, with 
the local user name (remote user name from the client’s view) and the 
principal’s name. The kvalid_user() subroutine checks if the principal is 
authorized to access the local AIX user’s account. Access is granted if one of 
the following conditions is true: 

1. The $HOME/.k5login file exists and lists the principal (in Kerberos form). 

2. The $HOME/.k5login file does not exist, and the principal name is the 
same as the local AIX user’s name. 

Case (1) is what is expected. But, be aware that case (2) above is quite 
counter-intuitive: It means that if the file does exist and is empty, access is 
denied, but if it does not exist, access is granted. This is completely reverse 
to the behavior of both the AIX $HOME/.rhosts file and the Kerberos 
Version 4 $HOME/.klogin file. However, it is documented to behave this way 
(and actually follows these rules) in both the kvalid_user() man page and the 
AIX Version 4.3 System Management Guide: Communications and Networks, 
SC23-4127. 

If the authorization check has passed, the krshd daemon checks if a Kerberos 
Version 5 TGT has been forwarded. If this is the case, it calls the ksdceiogin 
command that upgrades the Kerberos TGT to full DCE credentials and 
executes the command in that context. If this ksdceiogin cannot be done 
because no TGT was forwarded, the user’s login shell is used to execute the 
command without full DCE credentials. 
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Note: DFS home directories 


Note that this design may cause trouble if the user’s home directory is 
located in DFS. Since the kvalid_user() subroutine is called by krshd before 
establishing a full DCE context via k5dcelogin, kvalid_user() does not have 
user credentials. It runs with the machine credentials of the local host and 
can only access the user’s files if they are open to the other group of users. 
The files do not need to be open for the any_other group (and this would 
not help, either) since the daemon always runs as root and, therefore, has 
the hosts/<ip_hostname>/self credentials of the machine. 


Handling Kerberos Version 4 requests 

To authenticate the user, krshd uses the Kerberos Version 4 secret key of the 
rcmd.<ip_hostname> service and attempts to decrypt the service ticket sent 
by the client. If this succeeds, the client has authenticated itself. 

The daemon then checks the Kerberos Version 4 $HOME/.klogin file and 
grants access if the principal is listed in it. This is all done by code provided 
by the PSSP software, which is called by the base AIX krshd daemon. For 
this reason, Kerberos Version 4 authentication is only available on SP 
systems not on normal RS/6000 machines. 

- Note: rcmdtgt - 

PSPP 3.1 still includes the /usr/ipp/ssp/rcmd/bin/rcmdtgt command, which 
can be used by the root user to obtain a ticket-granting ticket by means of 
the secret key of the rcmd.<localhost> principal stored in /etc/krb-srvtab. 


6.12.2.4 NIM and remote shell 

There is one important exception to keep in mind with respect to the security 
integration of the rsh command: When using boot/install servers, NIM will use 
a remote shell connection from the boot/install server to the control 
workstation to update status information about the installation process that is 
stored on the control workstation. This connection is made by using the 
rcmd() system call rather than the authenticated rsh command. The rcmd() 
system call always uses standard AIX authentication and authorization. 

To work around this problem, PSSP uses the authenticated rsh command to 
temporarily add the boot/install server’s root user to the .rhosts file of the 
control workstation and removes this entry after network installation. 
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6.13 AFS as an SP Kerberos-based security system 

PSSP supports the use of an existing AFS server to provide Kerberos 
Version 4 services to the SP. It does not include the AFS server itself. 

Before installing PSSP on the control workstation, an AFS server must be 
configured and accessible. The setup_authent script, which initializes the SP’s 
authentication environment, supports AFS as the underlying Kerberos server. 
This is mainly contained in its setup_afs_server sub-command. 

PSSP: Installation and Migration Guide, GA22-7347 explains the steps that 
are required to initially set up SP security using an AFS server, and PSSP: 
Administration Guide, SA22-7348 describes the differences in the 
management commands of PSSP Kerberos and AFS Kerberos. 

AFS uses a different set of protocols, utilities, daemons, and interfaces for 
principal database administration. 

Usage of AFS on SP systems is optional. 

6.13.1 Setup to use AFS authentication server 

• When running the setup_authent command, ensure to answer yes to the 
question on whether you want to set up authentication services to use 
AFS servers. 

• The control workstation (CWS) may be an AFS server or an AFS client. 

• The AFS files ThisCell and CellServDB should be in /usr/vice/etc, or a 
symbolic link created. 

• kas command located in /usr/afsws/etc, or a symbolic link created. 

• AFS must be defined with an administrative attribute. 

• Run setup_authent providing the name and password of the AFS 
administrator. 

• Issue the k4iist command to check for a ticket for the administration 
account. 

6.13.2 AFS commands and daemons 

AFS provides its own set of commands and daemons. The AFS daemon is 
afsd, which is used to connect AFS clients and server. 
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Table 12 contains some commands that may be used for managing AFS. 


Table 12. Some Commands for Managing AFS 


Commands 

Description 

kas 

For adding, listing, deleting, and changing the AFS principal’s 
attributes. 

kas has corresponding subcommands, which are as follows: 
examine (for displaying Principal’s information), 

create (for adding Principals and setting passwords), 

setfields (for adding an authentication administrator and for 

changing Principal passwords and attributes), 
delete (for deleting Principals). 

kinit 

For obtaining authentication credentials. 

klog.krb 

(AFS 

command) 

For obtaining authentication credentials. 

klist or 

k41ist 

For displaying authentication credentials. 

token.krb 

(AFS 

command) 

For displaying authentication credentials. 

kdestroy 

For deleting authentication credentials, which involves removing 
tickets from the Kerberos ticket cache file. 

klog.krb 

The user interface to get Kerberos tickets and AFS tokens. 

unlog 

For deleting authentication credentials, which involves removing 
tokens held by AFS cache manager. 

kpasswd 

For changing passwords. 

pts 

This is the AFS protection services administration interface. It has the 

following subcommands: 

adduser (for adding a user to a group). 

chown (for changing ownership of a group). 

creategroup (for creating a new group). 

delete (for deleting a user or group from the database). 

examine (for examining an entry). 

listowned (for listing groups owned by an entry). 

membership (for listing membership of a user or group). 

removeusers (for removing a user from a group). 

setfields (for setting fields for an entry). 
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6.14 Sysctl is an SP Kerberos-based security system 

The sysctl security system can provide root authority to non-root users based 
on their authenticated identity and the task they are trying to perform. 

Sysctl can also be run as a command line command. 

Usage of sysctl on SP systems is optional. 

6.14.1 Sysctl components 

The server daemon for sysctl server is sysctld. The sysctl server also 
contains built-in commands, configuration files, access control lists (ACL), 
and client programs. 

Figure 98 shows the sysctl architecture. 


Kerberos server 


Service 

Tickets 


t 


Sysctl client 


Sysctl server (sysctld) 




Configuration File 
/etc/sysctl.conf 


Figure 98. Sysctl architecture 


6.14.2 Sysctl process 

The following is the sysctl process: 

1. The sysctl client code gets its authentication information from SP 
authentication services, Kerberos. 

2. The sysctl client code sends the authentication information with the 
Service Tickets and commands to the specified sysctl server. 

3. The server then performs the following tasks: 
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• Authenticates the clients 

• Decodes service ticket 

• Performs an authorization callback 

• Executes commands as root 

6.14.3 Terms and files related to the Sysctl process 

• Authorization callback: Once the client has been authenticated, the sysctl 
server invokes the authorization callbacks just before executing the 
commands. 

• Access control lists (ACL): These are text-based files that are used to give 
authority to specific users to execute certain commands. 

• Configuration files: There are two main configuration files related to sysctl: 

1. The /etc/sysctl.conf file configures the local sysctl server daemon by 
optionally creating variables, procedures and classes, setting 
variables, loading shared libraries, and executing sysctl commands. 
The /etc/sysctl.conf file is on every machine that runs the sysctld 
daemon. 

2. The /etc/sysctl.acl file contains the list of users authorized to access 
objects that are assigned the ACL authorization callback. 

• Tcl-based set of commands: Access to this is provided by the sysctld 
daemon. Tcl-based set of commands can be separated in the following 
three classes: 

1. Base Tci commands: These are the basic Tel interpreter 
commands. They are also defined in the /etc/sysctl.conf file. 

2. Built-in sysctl commands: These are Tcl-based IBM-written 
applications ready to be used by the sysctl programmer. These ACL 
processing commands include acladd, aclcheck, aclcreate, 
acldelete, and so on. 

3. User-written scripts: These are programmer written applications that 
use the base Tel commands and built-in sysctl commands. 


6.15 Related documentation 

The following documentation list consists of books that provide further 
explanation on the key concepts discussed in this chapter. 
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SP Manuals 

PSSP: Administration Guide, SA22-7348. Two chapters are on security on the 
SP system. Chapter 12 concentrates on security features of the SP system 
that includes conceptual information regarding Kerberos Version 4. Chapter 
13 is on sysctl, which covers its relationship to the SP system. 

SP Redbooks 

Inside the RS/6000 SP, SG24-5145. Section 4.6 of Chapter 4 is on SP 
Security. It gives a good overview of Kerberos, AFS, and sysctl. 

RS/6000 Scalable POWERparallel Systems, SG24-4542. Part 5 is solely on 
Kerberos and contains Chapters 12-19. They give details on Kerberos, which 
covers Kerberos secure authentication, Kerberos authentication protocols, 
installing Kerberos primary and secondary servers, how to implement 
Kerberos on the SP, and description of a list of Kerberos files. 

Study Guides 

IBM Global Services, RS/6000 SP, System Administration: Course Code 
AU96. (Unit 1 is on managing Kerberos authentication in the SP 
environment.) This book covers what Kerberos is used for on the SP, how to 
manage Kerberos principal authentication, how to keep Kerberos secure, and 
considerations on authentication server backup and recovery. Unit 5 is on 
working with the sysctl security system. Appendix C covers an overview of 
AFS authentication. 


6.16 Sample questions 

This section provides a series of questions to help aid you in preparation for 
the certification exam. The answers to these questions can be found in 
Appendix A. 

1. On the SP, the AIX command chauthent should not be used directly 
because: 

A. It is not supported on RS/6000 SP environments. 

B. Does not provide Kerberos v4 authentication. 

C. The rc.sp script will reset any change made locally using the chauthent 
command. 

D. The rc.sp script will fail if the chauthent command is used on a node. 

2. PSSP requires Kerberos v4 because some components still use this 
Kerberos level. These components are: 

A. hardmon and the nodecond scripts 
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B. The partition-sensitive daemons and file collection 

C. hardmon and NIM 

D. hardmon and sysctl 

3. The /etc/krb-srvtab file contains: 

A. The ticket-granting ticket (TGT). 

B. The list of principals authorized to invoke remote commands. 

C. The master key encrypted with the root.admin password. 

D. The private Kerberos keys for local services. 

4. Which of the following is not a Kerberos client in a standard PSSP 
implementation? 

A. IBM SP Perspectives 

B. The hardmon daemon 

C. Remote shell (rsh) 

D. The system control facility (sysctl) 

5. Which service names are used by the Kerberos-authenticated applications 
in an SP system? (Select two) 

A. hardmon 

B. rsh 

C. rep 

D. remd 

6. Which of the following statements is used to add a Kerberos Principal? 

A. Remote shell (rsh). 

B. Use the mkkp command to create a principal. 

C. Use the kerberos_edit command. 

D. The system control facility (sysctl). 

7. Which of the following SP services does NOT use Kerberos 
authentication? 

A. The sysctl facility 

B. The hardware control subsystem 

C. The remote execution commands 

D. Service and Manufacturing Interface 

8. Which of the following authentication methods is optional on an SP? 
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A. STD 

B. K4 

C. AFS 

D. K5 

9. Which of the following is NOT a Kerberos daemon? 

A. kpropd 

B. kinit 

C. kadmind 

D. kerberos 

10. After changing the master password, the administrator enters the kstash 
command. Which of the following statements is true? 

A. The command will propagate the new password to the secondary 
authentication servers. 

B. The command deletes the current cache file. 

C. The commands stores the new master key in the /kstash file. 

D. The command kills and restarts the kadmin daemon. 


6.17 Exercises 

Here are some exercises you may wish to perform: 

1. On a test system that does not affect any users, configure the SP 
authentication services on the control workstation (CWS) and other RS/6000 
workstations connected to the SP system. Change the principal’s password, 
change the Kerberos master password, store the new master key, and stop 
and start the server daemons for the changes to take affect. 

2. On a test system that does not affect any users, add a Kerberos principal. 

3. On a test system that does not affect any users, change the attributes of 
the Kerberos principal. 

4. Delete the above created Kerberos principal. 

5. On a test system that does not affect any users, set up an initialize a 
secondary Kerberos server. 
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Chapter 7. User and data management 

This chapter covers user management that consists of adding, changing, and 
deleting users on the SP system and how to control user login access. 

Data and user management using the file collections facility is also 
discussed. File collection provides the ability to have a single point of update 
and control of file images that will then be replicated across nodes. 

AMD and AIX Automounters are also discussed. These allows users local 
access to any files and directories no matter which node they are logged in 
to. 


7.1 Key concepts you should study 

The key concepts on user and data management are listed below in order of 
importance. 

• Considerations for administering SP users and SP user access control 
and procedures to perform them. 

• File collections and how it works in data management in the SP system. 

• How to work with and manage file collections and procedures to build and 
install file collections. 

• The concepts of AMD and AIX Automounter and how they manage 
mounting and unmounting activities using NFS facilities. 

• AMD to AIX Automounter migration and the main differences between the 
two. 


7.2 Issues on administering users on the SP system 

Table 13 consists of the issues and solutions on user and data management. 
You need to consider them when installing an SP system. 


Table 13. Issues and solutions when installing an SP system 


Issues 

Solutions 

How to share common files across the SP 

File Collections 

system? 

NIS 


File Collections 

How to maintain a single user space? 

NIS 


AMD 


AIX Automounter 
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Issues 

Solutions 

Within a single user space, how to restrict 
access to individual node? 

Login Control 

Where should user’s home directories 
reside? 

Control Workstation (CWS) 

Nodes 

Other Network System 

How does a user change access data? 

AMD 

AIX Automounter 

How does a user change the password? 

File Collections 

NIS 

How to keep access to nodes secure? 

Kerberos 

AIX Security 


SP User Management (SPUM) must be set up to ensure that there is a single 
user space across all nodes. It ensures that users have the same account, 
home directory, and environment across all nodes in the SP system. 


7.3 SP User data management 

The following three options may be used to manage the user data on the SP: 

• SP User Management (SPUM) 

• Network Information System (NIS) 

• Manage each user individually over each machine on the network. 

The first two are more commonly used and are discussed in this chapter. 

7.3.1 SP User Management (SPUM) 

The following information is covered by this chapter: 

• How to set up SP User management. 

• How to add/change/delete/list SP users. 

• How to change SP user passwords. 

• SP user login and access control. 

7.3.2 Set up SP User Management 

1. Enter smit site_env_diaiog. The output is shown in Figure 99 on page 221. 
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Site Environment Information 

Type or select values in entry fields. 

Press Enter AFTER making all desired changes. 

Default Network Install Image 

[Entry Fields] 
[bos.obj.ssp.432] 


Remove Install Image after Installs 


false 

+ 

NTP Installation 

NTP Server Hostname(s) 


consensus 

[""] 

+ 

NTP Version 


3 

+ 

Automounter Configuration 


true 

+ 

Print Management Configuration 

Print system secure mode login name 


false 

[""] 

+ 

User Administration Interface 
Password File Server Hostname 

Password File 

Home Directory Server Hostname 

Home Directory Path 


true 

[sp3en0] 

[/etc/passwd] 

[sp3en0] 

[/home/ sp3en0] 

+ 

File Collection Management 

File Collection daemon uid 


true 

[102] 

+ 

File Collection daemon port 


[8431] 

# 

SP Accounting Enabled 


false 

+ 

SP Accounting Active Node Threshold 


[80] 

# 

SP Exclusive Use Accounting Enabled 
Accounting Master 

Control Workstation LPP Source Name 


false 

[0] 

[aix432] 

+ 

Fl=Help F2=Refresh 

F3=Cancel 

F4=List 


F5=Reset F6=Command 

F9=Shell F10=Exit 

F7=Edit 

Enter=Do 

F8=Image 



Figure 99. Set up SP User Management 

2. Activate SPUM by setting the following fields to true. 

• Automounter Configuration 

• User Administration Interface 

• File Collection Management 

7.3.3 Add/change/delete/list SP users 

Using the SP User Management commands, you can add and delete users, 
change account information, and set defaults for your users' home 
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directories. Specify the user management options you wish to use in your site 
environment during the installation process, or change them later, either 
through SMIT panels or by using the spsitenv command or through SMIT by 
entering smit spmkuser. The PSSP Installation and Migration Guide , 
GA22-7347, contains detailed instructions for entering site environment 
information. 

The following are the steps for adding an SP user by entering smit spmkuser: 

• Check /usr/lpp/ssp/bin/spmkuser.default file for defaults for primary 
group, secondary groups, and initial programs. 

• The user’s home directory default location is retrieved from the SDR 
SP class, homedir_server, and homedir_path attributes. 

• spmkuser only supports the following user attributes: id, pgrp, home (as in 
hostname: home_directory_path format), groups, gecos, shell, and 
login. 

• A random password is generated and is stored in the 
/usr/lpp/ssp/config/admin/newpass.log file. 

Figure 100 shows the output screen for changing the characteristics of an SP 
user. All value fields can be changed except the name field. Nodes will pull 
the SPUM file collection from the CWS and update its configuration. 


*Change/Show Characteristics of a User 

Type or select values in entry fields. 

Press Enter AFTER making all desired changes. 


* User NAME 
User ID 
LOGIN user? 
PRIMARY group 
Secondary GROUPS 
HOME directory 
Initial PROGRAM 
User INFORMATION 


[Entry Fields] 
spuserl 

[218] # 

+ 

[ 1 ] + 

[staff] + 

[sp3en0:/home/sp5en0/sp> 
[/bin/ksh] / 

[] 


Fl=Help 

F5=Reset 

F9=Shell 


F2=Refresh 
F6=Command 
F10=Exit 


F3=Cancel 

F7=Edit 

Enter=Do 


F4=List 

F8=Image 


Figure 100. Changing the characteristics of an SP user 
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Figure 101 shows the output screen for removing an SP user with the smit 
sprmuser command. Both authentication information and home directory may 
be removed. When deleting a user, the entry for that user in the newpass.log 
file doesn’t get removed. 


Remove a User 



Type or select values in entry fields. 

Press Enter AFTER making all desired changes. 


Remove AUTHENTICATION information? 
Remove HOME directory? 


[Entry Fields] 

No + 

No + 

* User NAME 

User ID 

PRIMARY group 

Secondary GROUPS 

HOME directory 

Initial PROGRAM 

User INFORMATION 


spuserl 

218 

1 

staff 

/u/spuserl on sp3en0(/> 

/bin/ksh 

Fl=Help F2=Refresh 

F5=Reset F6=Command 

F9=Shell F10=Exit 

F3=Cancel 

F7=Edit 

Enter=Do 

F4=List 

F8=Image 


Figure 101. Removing an SP user 

The following example shows how to list SP users with the spiuser command: 

spluser spuserl 

The output will appear as the following: 

spuserl id=201 pgrp=l groups=staff home=/u/spuserl on 
sp3en0:/home/sp3en0/spuserl shell=/bin/ksh gecos= login=true 


7.3.4 Change SP user passwords 

The SP user passwords may be changed in the following manner: 

• The user must log on to the system where the master password file is. 
Normally, it is on the control workstation (CWS). 

• Use the passwd command to change the password. 

• /etc/passwd and /etc/security /password files must be updated on all 
nodes. 
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7.3.5 Login control 

It is advisable to limit access to the control workstation (CWS). But, users 
need CWS access to change their passwords in the pure SPUM. A script may 
be used to enable certain users to access CWS. This script is: 

/usr/lpp/ssp/config/admin/cw_restrict_login 

In order to use this script, /usr/ipp/ssp/config/admin/cw_aiiowed it must be 
edited to include the list of users who are permitted CWS login access. This 
file only allows one user per line starting at the left-most column, and no 
comments can be included on that file. Root user is not required to be listed 
in this file. 

To make the script work, it must be included in /etc/profile in the CWS. If a 
restrictive login is to be removed, just comment out or delete the lines that 
were added in the /etc/profile file. 

7.3.6 Access control 

Due to the fact that interactive users have a potential negative impact on 
parallel jobs running on nodes, the spacs_cntri command must be executed 
on each node where access control for a user or group of users must be set. 

To restrict a user (for example, spuseri) on a particular node, enter spac_cntri 
block spusen on that node. 

To restrict a group of users on a particular node, create a file with a row of 
user names (for example, name_iist) and enter spacs_cntri -f name_iist on 
that node. 


To check what spacs_cntrl is doing, enter: spacs_cntrl -v -1 block spuseri 


7.4 Configuring NIS 

Although an SP is a machine containing multiple RS/6000 nodes, you do not 
want to maintain an SP as multiple computers but as one system. NIS is one 
of the tools that can make the daily operations of an SP simple and easy. 

NIS is a distributed system that contains system configuration files. By using 
NIS, these files will look the same throughout a network, or in this case, 
throughout your SP machine. NFS and NIS are packaged together. Since the 
SP install image includes NFS, NIS comes along as well. 
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The most commonly used implementations of NIS are based upon the 
distribution maps containing the information from the /etc/hosts file and the 
user-related files: /etc/passwd, /etc/group, and /etc/security/passwd. 

NIS allows a system administrator to maintain system configuration files in 
one place. These files only need to be changed once then propagated to the 
other nodes. 

From a user's point of view, the password and user credentials are the same 
throughout the network. This means that the user only needs to maintain one 
password. When the user's home directory is maintained on one machine and 
made available through NFS, the user's environment is also easier to 
maintain. 

From an SP point of view, an NIS solution removes the SP restriction of 
changing user's passwords on the control workstation. When you would use 
File Collections for system configuration file distribution, users have to 
change their password on the control workstation. When using NIS, you can 
control user password management across the SP from any given node. 

7.4.1 Setting up NIS 

You can use SMIT to set up NIS, manage it, and control the NIS daemons. In 
your planning, you must decide whether you will have slave servers and 
whether you will allow users to change their passwords anywhere in the 
network. 

7.4.1.1 Configuring a master server 

Configuring a master server can be done by entering the smit mkmaster 
command. 

By default, the NIS master server maintains the following files that should 
contain the information needed to serve the client systems: 

/etc/ethers 

/etc/group 

/etc/hosts 

/etc/netgroup 

/etc/networks 

/etc/passwd 

/etc/protocols 

/etc/publickey 
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/etc/rpc 

/etc/security/group 

/etc/security/passwd 

/etc/services 

Any changes to these files must be propagated to clients and slave servers 
using SMIT: 

Select: Manage NIS Maps 

Select: Build / Rebuild Maps for this Master Server 

Either specify a particular NIS map by entering the name representing the file 
name or leave the default value of all, then press Enter. You can also do this 
manually by changing to the directory /etc/yp and entering the command make 
ail or make <map-name>. This propagates the maps to all NIS clients and 
transfers all maps to the slave servers. 

7.4.1.2 Configuring a slave server 

A slave server is the same as the master server except that it is a read-only 
server. Therefore, it cannot update any NIS maps. Making a slaver server 
implies that all NIS maps will be physically present on the node configured as 
the slave server. As with a master server, the NIS map files on a slave server 
can be found in /etc/yp/<domainname>. 

You may configure a slave server with the smit mksiave command. 

Configuring a slaver server starts the ypbind daemon that searches for a 
server in the network running ypserv. Shortly afterwards, the ypserv daemon 
of the slave server itself will start. 

In many situations, the slave server must also be able to receive and serve 
login requests. If this is the case, the slave server must also be configured as 
an NIS client. 

7.4.1.3 Configuring an NIS client 

An NIS client retrieves its information from the first server it contacts. The 
process responsible for establishing the contact with a server is ypbind. 

You may also configure a Client Server using SMIT by entering smit mkciient 
on every node or use edit the appropriate entries in the script.cust file. This 
can be done at installation time or later through changing the file and then 
doing a customized boot on the appropriate node. 
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7.4.1.4 Change NIS password 

You may change an NIS user password with the passwd or yppasswd 
commands. 


7.5 File collections 

The SP system also has another tool that ensures that system configuration 
files look the same throughout your SP network. This tool is called File 
Collection Management. 

File collections are sets of files or directories that simplify the management of 
duplicate or common files on multiple systems, such as SP nodes. A file 
collection can be any set of regular files or directories. PSSP is shipped with 
a program called /var/sysman/supper, which is a Perl program that uses the 
Software Update Protocol (SUP) to manage the SP file collections. 

When configuring the SDR, you are asked if you want to use this facility. 
When answered affirmatively, the control workstation configures a 
mechanism for you that will periodically update the system configuration files 
(you specify the interval). The files included in that configuration are: 

• All files in the directory /share/power/system/3.2. 

• Some of the supper files. 

• The AMD files 

• The user administration files (/etc/group, /etc/passwd, and 
/etc/security/group). 

• /etc/security/passwd. 

In terms of user administration, the File Collection Management system is an 
alternative to using NIS for users who are not familiar with NIS or do not want 
to use it. 

7.5.1 Terms and features of file collections 

There are unique terms and features of File Collections, which are covered in 
the following sections. 

7.5.1.1 Terms used when defining File Collections 

• Resident. A file collection that is installed in its true location and able to 
be served to other systems. 

• Available: A file collection that is not installed in its true location but 
able to be served to other systems 
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7.5.1.2 Unique features of File Collections 

The following are the unique features on file collections: 

• Master Files: 

A file collection directory does not contain the actual files in the 
collection. Instead, it contains a set of Master Files to define the 
collection. Some Master Files contain rules to define which files can 
belong in the collection and others contain control mechanisms, 
such as time stamps and update locks. 

• The supper command interprets the Master Files: 

You handle files in a collection with special procedures and the 
supper commands, rather than with the standard AIX file commands. 
The supper command interprets the Master Files and uses the 
information to install or update the actual files in a collection. You 
can issue these commands in either a batch or interactive mode. 

• /var/sysman/file.collections: 

File collections require special entries in the 
/var/sysman/file.collections, and you need to define them to the 
supper program. They also require a symbolic link in the 
/var/sysman/sup/lists file pointing to their list Master File. 

• Unique user ID: 

File collections also require a unique, unused user ID for supman, 
the file collection daemon, along with a unique, unused port through 
which it can communicate. 

The default installation configures the user ID, supman_uid, to 102 
and the port, supfilesrv_port, to 8431. You can change these values 
using SMIT or the spsitenv command. 

• supman is the file collection daemon: 

The file collection daemon, supman, requires read access 
permission to any files that you want managed by file collections. 

For example, if you want a security file, such as /etc/security/limits, 
managed, you must add the supman ID to the security group. This 
provides supman with read access to files that have security group 
permission and allows these files to be managed across the SP by 
file collections. You can add supman to the security group by adding 
supman to the security entry in the file /etc/groups. 
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7.5.2 File collection types 

File collections can be primary or secondary. Primary file collections are used 
by the servers and also distributed out to the nodes. Secondary file 
collections are distributed from the server but not used by the server. 

A primary file collection can contain a secondary file collection. For example, 
the power_system file collection is a primary file collection that consists of the 
secondary file collection, node.root. This means that power_system can be 
installed onto a boot/installl server, and all of the files that have been defined 
within that file collection will be installed on that boot/install node, including 
those in node.root. However, the files in node.root would not be available on 
that node because they belong to a secondary file collection. They can, 
however, be served to another node. This avoids having to install the files in 
their real or final location. 

Secondary file collection allows you to keep a group of files available on a 
particular machine to serve to other systems without having those files 
installed. 

For example, if you want to have one .profile on all nodes and another .profile 
on the control workstation, consider using the power_system collection 
delivered with the IBM Parallel System Support Programs for AIX. This is a 
primary collection that contains node.root as a secondary collection. 

• Copy .profile to the /share/power/system/3.2 directory on the control 
workstation. 

• If you issue supper install power_system on the boot/install server, the 
power_system collection is installed in the /share/power/system/3.2 
directory. Because the node.root files are in that directory, they cannot be 
executed on that machine but are available to be served from there. In this 
case, .profile is installed as /share/power/system/3.2/.profile. 

• If you issue supper install node.root on a processor node, the files in 
node.root collection are installed in the root directory and, therefore, can 
be executed. Here, /share/power/system/3.2/.profile is installed from the 
file collection as /.profile on the node. 

Secondary file collection is useful when you need a second tier or level of 
distributing file collections. This is particularly helpful when using boot/install 
servers within your SP or when partitioning the system into groups. 

7.5.3 Pre-defined file collections 

On the SP, there is a pre-defined collection of user-administration files: 
/etc/passwd and /etc/services. 
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PSSP is shipped with four predefined file collections: 

• sup.admin 

• user.admin 

• power_system 

• node.root 

Information about each collection on a particular machine can be displayed 
by using the supper status command. You may issue the command 
anywhere. For example: 

/var/sysman/supper status 

7.5.3.1 sup.admin collection 

The sup.admin file collection is a primary collection that is available from the 
control workstation, is resident (that is, installed), and available on the 
boot/install servers and resident on each processor node. 

This file collection is important because it contains the files that define the 
other file collections. It also contains the file collection programs used to load 
and manage the collections. Of particular interest in this collection are: 

• /var/sysman/sup, which contains the directories and Master Files that 
define all the file collections in the system. 

• /var/sysman/supper, which is the Perl code for the supper tool. 

• /var/sysman/file.collections, which contains entries for each file 
collection. 

7.5.3.2 user.admin collection 

The user.admin file collection is a primary collection that is available from the 
control workstation, resident, and available on the boot/install servers and 
resident on each processor node. This file collection contains files used for 
user management. When the user management and file collections options 
are turned on, this file collection contains the following files of particular 
interest: 

• /etc/passwd 

• /etc/group 

• /etc/security/passwd 

• /etc/security/group 

The collection also includes the password index files that are used for log in 
performance: 
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/etc/passwd.nm.idx 

/etc/passwd.id.idx 

/etc/security/passwd.idx 


7.5.3.3 power_system collection 

The power_system file collection is used for files that are system dependent. 
It is a primary collection that contains one secondary collection called the 
node.root collection. The power_system collection contains no files other 
than those in the node.root collection. 

The power_system collection is available from the control workstation and 
available from the boot/install servers. When the power_system collection is 
installed on a boot/install server, the node.root file collection is resident in the 
/share/power/system/3.2 directory and can be served from there. 

7.5.3.4 node.root collection 

This is a secondary file collection under the power_system primary collection. 
The node.root collection is available from the control workstation, resident, 
and available on the boot/install servers and resident on the processor nodes. 
It contains key files that are node-specific. 

The node.root file collection is available on the control workstation and the 
boot/install servers under the power_system collection so that it can be 
served to all the nodes. You do not install node.root on the control 
workstation because the files in this collection might conflict with the control 
workstation's own root files. 

7.5.4 File collection structure 

The file collection servers are arranged in a hierarchical tree structure to 
facilitate the distribution of files to a large selection of nodes. 

The control workstation is normally the Master Server for all of the default file 
collections. That is, a master copy of all files in the file collections originates 
from the control workstation. The /var/sysman/sup directory contains the 
Master Files for the file collections. 

Figure 102 on page 232 shows the structure of /var/sysman/sup directory, 
which consists of the Master Files for a file collection. 


Chapter 7. User and data management 231 




Figure 102. /var/sysman/sup files and directories 


The following provides an explanation on these files and directories: 

.active: Identifies the active volume group. It is not found on the control 
workstation. 

.resident: Lists each file collection in the SP system. It is not found on the 
control workstation. 

refuse: Files are listed in this file for exclusion from updates, 
supfilesrv.pid: Consists of the process ID of the supfilesrv process. 

The directories are: 

lists: Contains links to the list files in each file collection. 

node.root: Contains the Master Files in the node.root collection. 

power_system: Contains the Master Files in the power_system 
collection. 

sup.admin: Contains the Master Files for the sup.admin collection, 
user.admin: Contains the Master Files in the user.admin collection. 

An example of an individual file collection with its directory and Master Files 
is illustrated in Figure 103 on page 233. It shows the structure of the 
/var/sysman/sup/sup.admin file collection. 
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Figure 103. sup.admin master files 


The following provides an explanation on these files. 


last: 

Consists of a list of files and directories that have 
been updated. 

list: 

Consists of a list of files that is part of the file 
collection. 

lock: 

An empty lock file that prevents more than one update 
at the same time. 

prefix: 

Consists of the name of a base directory for file 
references and the starting point for the supper scan 
process. 

refuse: 

Consists of a list of files to be excluded from update. 

scan: 

Consists of a list of files for the collection with their 
permission and time stamp. 

supperlock: 

Created by supper to lock a collection during updates. 

when: 

Contains the time for the last file collection update. 

activate volume: 

Sets the active volume group. The active volume 
group must be set before installing a collection that 
requires a file system. 

debug: 

Offers a choice of on or off. Turn debug messages on 
or off. 

diskinfo: 

Shows available disk space and active volume. 

files collection: 

Shows all files associated with a resident collection. 
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install collection: Installs a collection. 

log: Shows a summary of the last/current supper session, 

offline collection: Disables updates of a collection, 
online collection: Enables updates of a collection (this is the default), 
quit: Exits the program, 

remove collection: Removes a collection. 

reset collection: Sets the last update time of a collection to the epoch. 

rlog: Shows raw output of the last/current supper session. 

scan collection: Runs a scan for a collection. 

serve: Lists all collections this machine is able to serve. 

status: Shows the current status of all available collections. 

The status information includes the names of all 
collections, whether they are resident on the local 
machine, and the name and size of the file system 
associated with each collection. 

update collection: Updates a collection. 

verbose: Offers a choice of on or off. Turn SUP output 

messages on or off. 

when: Prints the last update time of all resident collections, 

where: Shows the current servers for collections. 

! command: Shell escape. 

7.5.5 File collection update process 

The file collection update process may be done in two ways: 

• Set up file collection commands in the crontab file to run in a sequence: 

The actual update occurs on the Master Files on the control workstation. 

Issue the update command from the boot/install server to request file 
collection updates from the control workstation. 

Issue the update command from nodes to the boot/install server to obtain 
the required change to its files in the file collections. 

• Issue the /var/sysman/super update user.admin command on each node. 
This can also be performed remotely through the rsh and rexec 
commands. 
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7.5.6 Supman user ID and supfilesrv daemon 

• The supman user ID should be a member of the security group, that is, 
add supman in security in the /etc/group file. This will allow it to have read 
access to any files to be managed by file collections. 

• User ID for supman must be unique and unused. By default, it is 102. 

• The supfilesrv daemon resides on the master server only. 

7.5.7 Commands to include or exclude files from a file collection 

The following are commands to include or exclude files from a file collection: 

upgrade: Files to be upgraded unless specified by the omit or omitany 
commands. 

always: Files to be upgraded. This ignores omit or omitany commands. 

omit: Files to be excluded from the list of files to be upgraded. 

omitany: Wild card patterns may be used to indicate the range of exclusions. 

execute: The command specified is executed on the client process whenever 
any of the files listed in parentheses are upgraded. 

symiink: Files listed are to be treated as symbolic links. 

7.5.8 Work and manage file collections 

Working and managing file collections involves the following activities: 

• Reporting file collection information. 

• Verifying file collection using the scan command. 

• Updating files in a file collection. 

• Adding and deleting files in a file collection. 

• Refusing files in a file collection. 

Brief explanation on these activities are as follows. 

7.5.8.1 Reporting file collection information 

The supper command is used to report information about file collections. It has 
a set of subcommands to perform files and directories management that 
includes verification of information and the checking of results when a 
procedure is being performed. 
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Table 14 provides a list of the supper subcommands or operands that can be 
used to report on the status and activities of the file collections. 


Table 14. Brief description of supper subcommands 


Supper 

Subcommands 

Runs on 

Reports on 

where 

Node 

Boot/Install Server 

Current boot boot/install servers for 
collections. 

when 

Node 

Boot/Install Server 

Last update time of all resident collections. 

diskinfo 

Boot/Install Server 
CWS 

Available disk space and active volume on 
your machine. 

leg 

Node 

Boot/Install Server 

Summary of the current or most recent 
supper session. 

rlog 

Node 

Boot/Install Server 

Raw output of the current or most recent 
supper session. 

status 

Node 

Boot/Install Server 
CWS 

Name, resident status, and access point of 
all available file collections, plus the name 
and estimated size of their associated file 
systems 

files 

Node 

Boot/Install Server 

All the resident files resulting from a supper 
update or install command 

serve 

Boot/Install Server 
CWS 

All the collections that can be served from 
your machine. 

scan 

Node 

Boot/Install Server 
CWS 

For verifying file collection. It creates a scan 
file in the /var/sysman/sup directory. The file 
consists of a list of files and directories in the 
file collection with the date installed and 
when it was last updated. 

update 

CWS 

If a scan file is present, the update 
command reads it as an inventory of the files 
in the collection and does not do the 
directory search. 

If there is no scan file in the collection, the 
update command will search the directory, 
apply the criteria in the master files, and add 
the new file. 

install 

CWS 

To install a collection. 
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7.5.8.2 Verifying file collections using scan 

By running the supper scan command, a scan file will be created in the 
/var/sysman/sup directory. The scan file consists of: 

• A list of all files and directories in the file collection. 

• Shows permissions. 

• Shows date installed and last updated. 

7.5.8.3 Updating files in a file collection 

Make sure changes are made on the files on the master file collection. If there 
is no /var/sysman/sup/scan file on the server, run the supper scan command. 

Run the supper update command, first on any secondary server, then on the 
clients. The supper update command may be included in the crontab file to run 
regularly. 

Supper messages are written to the following files: The 
/var/adm/SPIogs/filec/sup<mm>.<dd>.<yy>.<hh>.<mm> summary file and 
the /var/adm/SPIogs/filec/sup<mm>.<dd>.<yy>.<hh>.<mm>r detailed file. 

7.5.8.4 Adding and deleting files in a file collection 

Prior to performing addition or deletion of files in a file collection, the following 
must be considered: 

• Make sure you are working with the master files. 

• Add or delete files using standard AIX commands. 

• Consider whether the files are in a secondary or primary collection. 

• Check what the prefix, list, and refuse files contain. 

• Check the prefix from the start point of the tree for file collection. 

• If the file is not found in the tree structure, copy the file to it. 

• If the entry is needed in the list file, add the entry to the list file. 

• If there is no scan file on the master, run the supper scan command. 

• Run the supper update command on the nodes. 

7.5.8.5 Refusing files in a file collection 

The refuse file allows you to customize the file collection at different 
locations. It is possible for you to create a file collection with one group of 
files and have different subsets of that group installed on the boot/install 
servers and the nodes. 
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The refuse file is created in the /var/sysman/sup directory on the system that 
will not be getting the files listed in the refuse file. 

On a client system, the /var/sysman/sup/refuse file is a user-defined text file 
containing a list of files to exclude from all the file collections. This allows you 
to customize the file collections on each system. You list the files to exclude 
by their fully qualified names, one per line. You can include directories, but 
you must also list each file in that directory you want excluded. 

A system-created file contains a list of all the files excluded during the update 
process. If there are no files for this collection listed in the refuse file in the 
/var/sysman/sup directory, the refuse file in this directory will have no entries. 

7.5.9 Modifying the file collection hierarchy 

The default hierarchy of updates for file collections is in the following 
sequence: 

1. Control Workstation (CWS) 

2. boot/install servers 

3. Nodes 

However, the default hierarchy can be changed. The following is an example 
of this: 

• Original scenario: 

CWS is the master server for the following two frames for the 
power_system file collection, and node.root is the secondary file 
collection associated with it. 

Frame 1: For the nodes and boot/install server A. 

Frame 2: For the nodes and boot/install server B. 

• Change the hierarchy so that the boot/install server B on Frame 2 will 
become the master server for the power_system file collection: 

Take the boot/install server B off-line on Frame 2 by using the supper 
offline command. This will eliminate the logical path from the CWS to 
the boolt/install server B for the power_system file collection. 

• After the hierarchy change: 

If a change is now made to node.root on the CWS, the boot/install 
server A and the nodes on Frame 1 will get updated, but boot/install 
server B and the nodes on Frame 2 will not get updated. 
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If the same change is required on boot/install server B, then the update 
must be performed directly to the files on boot/install server B. Then 
the nodes on Frame 2 will get updated as well. 


7.5.10 Steps in building a file collection 

You may create your own file collection. You can build a file collection for any 
group of files that you want to have identically replicated on nodes and 
servers in your system. 

There are seven steps in building a file collection, and you must be root to 
perform all of them. 

1. Identify the files you wish to collect. For example, it has been decided that 
program files (which are graphic tools) in /usr/local directory are to be 
included on all nodes. 

2. Create the file collection directory. In this case, create the 
/var/sysman/sup/tools directory 

3. Create master files that are list, prefix, lock, and supperlock. Pay attention 
to the list file that consists of rules for including and excluding files in that 
directory. Lock and supperlock files must be empty. 

4. Add a link to the lists file in the /var/sysman/sup directory. For example, 

In -s /var/sysman/sup/tools/list /var/sysman/sup/lists/tools. 

5. Update the file.collections file. Add the name of the new file collection as 
either a primary or secondary collection. 

6. Update the .resident file by editing the .resident file on your control 
workstation or your master server and add your new collection, if you want 
to make it resident, or use the supper install command. 

7. Build the scan file by running supper scan. The scan file only works with 
resident collections. It consists of an inventory list of files in the collection 
that can be used for verification and eliminates the need for supper to do a 
directory search on each update. 

7.5.11 Installing a file collection 

During initialization and customization processes, the required SP file 
collections are first built on the CWS and then installed on the boot/install 
servers and processor nodes. However, if you create your own, you have to 
install them on each server or node. 

There are four steps involved, and you must be root to perform installation of 
a new file collection. 
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1. Update the sup.admin file collection that contains all the files that identify 
and control the file collections, such as the file .collections and .resident 
files. Whenever changes are made to these two files, you need to update 
the sup.admin collection to distribute these updates. For example: 

/var/sysman/supper update sup.admin 

2. Run supper install command on each boot/install server or node which 
needs this collection. For example: 

/var/sysman/supper install sup.admin 

3. Add the supper update for new file collection to crontaks. 

4. Run the supper scan command on the master. 

7.5.12 Removing a file collection 

The pre-defined file collections that come with the SP are required. Removing 
them will result in problems when running PSSP. Removing a file collection 
does not delete it. It removes it from the place it was installed. To completely 
delete a file collection, you must remove it from every place it was installed. 

There are two steps in removing a file collection: 

1. Run supper scan to build a scan file. This will help to verify that none of the 
files in the file collection will be needed. 

2. After verification, run the supper remove <fiie collection command to 
remove the file collection. 

7.5.13 Diagnosing file collection problems 

The following is the cross reference summary of common file collection 
problems and solutions: 

Section 16.5, “Diagnosing file collection problems” on page 452 


7.6 SP user files and directories management 

Berkeley Automounter (also known as AMD) and AIX Automounter have been 
used for SP user files and directories management. 

AMD has been used by PSSP 1.2, 2.1 and 2.2. AIX Automounter has been 
used by PSSP 2.3 onwards. 

An NFS automounting system and the SPUM Interface environment allow 
users local access to any files and directories no matter which node they are 
logged on to. 
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7.6.1 Berkeley Automounter, AMD 

AMD is used for automatic and transparent mounting and unmounting of NFS 
file systems and is a simple and effective way for managing NFS file systems 
and directories. 

The AMD daemon runs on CWS, boot/install servers, and all nodes on the SP 
system. It monitors specified directory mount points, and when a file I/O 
operation is requested to that mount point, it performs the RPC call to 
complete the NFS mount to the server specified in the automount map files. 

Any mount point directories that do not already exist on the client will be 
created. After a period of inactivity, two minutes by default, the automount 
daemon will attempt to unmount any mounted directories under its control. 
The mounted directories can come from SP boot/install servers or any 
workstation or server on the network. 

AMD is not an IBM product, and its information can be found in the 
compressed file named /usr/ssp/lpp/public/amd_up102.tar.Z. 

There are two types of file maps. These are indirect maps and direct maps. 

• Indirect maps are useful for commonly-used, higher-level directories, 
such as /home. 

• Direct maps are useful when directories cannot be dedicated for 
automount, such as /usr. 

AMD can be enabled by entering the smit enter_data command and selecting 
true for AMD configuration. The system will then run the amd_config Perl 
script. This script is located in the /usr/lpp/ssp/install/bin directory that adds 

the amd_start script to /etc/rc.sp. 


7.6.2 AIX Automounter 

AIX Automounter is a tool that can make the RS/6000 SP system appear as 
only one machine to both the end users and the applications by means of a 
global repository of storage. It manages mounting activities using standard 
NFS facilities. It mounts remote systems when they are used and 
automatically dismounts them when they are no longer needed. 

The number of mounts can be reduced on a given system and has less 
probability of problems due to NFS file server outages. 

On the SP, the Automounter may be used to manage the mounting of home 
directories. It can be customized to manage other directories as well. It 
makes system administration easier because it only requires modification of 
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map files on the control workstation (CWS) to enable changes on a 
system-wide basis. 

The following are the steps for Automounter initial configuration: 

1. Use the smit enter_data command on the CWS to perform PSSP 
installation, which displays Site Environment Information. 

2. Add users to the system. 

3. Ensure the amd_config variable is set to true so that the automountd 
(which is the automounter daemon) will start. 

4. Ensure usermgmt_config variable is set so that the maps for the user’s 
home directories will be maintained. 

The AIX Automounter reads automount map files to determine which 
directories to handle under a certain mount point. These map files are kept 
in the /etc/auto/map directory. The list of map files for the Automounter is 
stored in the /etc/auto.master file. The master files can also be accessed 
by means of NIS 

7.6.3 AMD to AIX Automounter migration 

As of PSSP 2.3, use of the public domain BSD automounter, the AMD 
daemon, was replaced with native AIX automounter support, which is 
available as part of NFS in the Network Support Facilities of the AIX Base 
Operating System (BOS) Runtime. The AIX automount daemon is shipped 
with AIX 4.3.0 and older systems. In AIX 4.3.1, this daemon was replaced 
with the AutoFS implementation. AMD uses map files to define the 
automounter control. These map files are not compatible with the AIX 
automounter and must be converted. 

7.6.3.1 Migration considerations 

If your current installation has SP automounter support configured (the 
amd_config site environment variable is true) when migrating to PSSP 3.1 
from PSSP 2.2, the system configuration process (services_config) will 
create a new /etc/auto directory structure and default automount 
configuration files. 

If SP User Management services is also configured (the usermgmt_config 
site environment variable is true), your existing /etc/amd/amd-maps/amd.u 
map file will be used to automatically create a new /etc/auto/maps/auto.u map 
file. 

The mkautomap command is a migration command used to generate an 
Automount map file from the AMD map file AMD_map created by a previous 
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SP release. Only AMD map file entries created by a previous SP release will 
be recognized. If the AMD map file was modified by the customer, results 
may be unpredictable. If an AMD map entry cannot be properly interpreted, a 
message will be written to standard error, and that entry will be ignored. 
Processing will continue with the next map entry. All recognized entries will 
be interpreted. 

The mkautomp command migrates /etc/amd/amd-maps/amd.u file and add /u 
file system to /etc/auto.master. It also modifies the syslog configuration so 
that errors are directed to /var/adm/SPIogs/SPdaemon.log file. 

7.6.4 Diagnosing AMD and Automount problems 

The following are cross references on AMD and Automount problem 
diagnosis: 

Section 16.4.1, “Problems with AMD” on page 448. 

Section 16.4.2, “Problems with user access or automount” on page 449. 

7.6.5 Coexistence of the AMD and AIX Automounters 

A system may consist of newer nodes running PSSP Version 2.3 as well as 
older nodes running PSSP Versions 2.2 or earlier. Nodes running PSSP 2.3 
use AIX Automounter, and nodes running prior versions use AMD. 

The SP will configure and run the native AIX automounter on the newer 
nodes containing PSSP 2.3 and later releases and the BSD AMD daemon on 
the older nodes containing PSSP 2.2. 

If the SP User Management services have also been configured 
(usermgmt_config site environment variable is also true), the control 
workstation will create and maintain both the automount map file 
/etc/auto/maps/auto.u and the AMD map file /etc/amd/amd-maps/amd.u. The 
spmkuser, spchuser, and sprmuser commands (and their SMIT equivalents) will 
process user home directory entries in both map files. 

If filecolLconfig variable has been set to true under Site Environment 
Information during installation, then the SP is configured to manage file 
collections. In this case, AIX automounter map files will be automatically 
distributed to all the nodes by means of supper. 
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7.7 Related documentation 

The following books are recommended readings to provide a broader view on 
user and data management. 

SP Manuals 

PSSP: Administration Guide, SA22-7348. Chapter 4 describes file collections 
thoroughly that cover the concepts, how to create file collections, how it 
works, and so forth. Chapter 5 gives detailed description on managing user 
accounts that covers how to set up SP users and how to change, delete, and 
list them. 

SP Redbooks 

Inside the FtS/6000 SP, SG24-5145. Chapter 4, Section 4.8 contains 
description on file collection that cover the definition, file collection building, 
installation, organization, maintenance, and so forth. Section 4.9 covers 
managing AIX Automounter and its difference from BSD Automounter. 

Technical Presentation on PSSP Version 2.3, SG24-2080. Chapter 5 
contains detailed descriptions on AIX Automounter that covers distribution of 
files, how to create map files, migration from AMD (prior to PSSP V2.3) to AIX 
Automounter, and so forth. 

Study Guides 

IBM Global Services, RS/6000 SP, System Administration: Course Code 
AU96. Unit 2 describes the managing of user accounts, which covers 
considerations for setting up and administering users in a distributed system 
and setting up login control. Unit 3 describes the managing of user directories 
that cover automounting of NFS file systems, usage, and setting up of the 
AMD Automounter. Unit 4 covers data management that covers file collection 
concepts, how to work with and manage file collections, build and install 
them, the difference between using NIS and file collections, and so forth. 


7.8 Sample questions 

This section provides a series of questions to help aid you in preparation for 
the certification exam. The answers to these questions can be found in 
Appendix A. 

1. Why passwords cannot be changed directly on the node if SP Users 
Management is being used? 

A. Because there is not a passwd command on the nodes. 
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B. Because the /etc/passwd and /etc/security/passwd files are not present 
on the nodes. 

C. Because the /etc/passwd and /etc/security/passwd files get replaced 
with the files from the passwd file server. 

D. Who says it cannot be done? 

2. What is the difference between an AIX user and a SP user? 

A. An AIX user is able to access a local resources in a node, while a SP 
user can only access SP related resources. 

B. There is no difference, just terminology. 

C. SP users are global AIX users managed by the SP User Management 
facility on the SP. 

D. SP user can Telnet to the control workstation, while AIX user cannot. 

3. What is the command you would use to set access control on a node? 

A. spac_block 

B. cntrl_access 

C. restricjogin 

D. spac_cntrl 

4. What is the file collection that contains all the user management related 
files? 

A. user_admin 

B. user.admin 

C. user.mgmt 

D. user_mgmt 

5. Which of the following predefined file collections is NOT shipped with 
PSSP? 

A. node.root 

B. power_system 

C. user.admin 

D. supper.admin 

6. What is the command that is used to report information about file 
collections? 

A. online_collection 

B. supper 
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C. offline_collection 

D. update_collection 

7. Which tools allows a system administrator to maintain system 
configuration files in one place? 

A. DFS 

B. NFS 

C. NIS 

D. AFS 

8. What is the default hierarchy sequence of updates for file collections? 

A. Nodes/BIS/CWS 

B. CWS/Nodes/BIS 

C. BIS/CWS/Nodes 

D. CWS/BIS/Nodes 

9. What must be considered prior to performing addition or deletion of files in 
a file collection? 

A. Run the supper install command on the nodes. 

B. If there is no scan file on the master, run the supper status command. 

C. Make sure you are working with the master files. 

D. If the entry is needed in the list file, add entry to /.k file. 

10. Which tool can make the RS/6000 SP system appear as only one machine 
to both the end users and the applications? 

A. NFS 

B. AIX Automounter 

C. NIS 

D. DFS 


7.9 Exercises 

Here are some exercises you may wish to perform: 

1. On a test system that does not affect any users, build a file collection. You 
must be root to perform this exercise. 

2. Install the file collection that you created on the previous exercise. 
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3. Display all the file collections on your system. 

4. Remove the file collection you added on exercise 1. Can you remove the 
predefined file collection on your system? Explain. 

5. On a test system that does not affect any users, add, change, list, and 
delete SP users. 
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Part 2. Installation and configuration 
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Chapter 8. Configuring the control workstation 


This chapter addresses various topics related to the initial configuration of the 
CWS: Preparation of the environment, copy of the AIX and PSSP Ipp from the 
distribution media to the CWS disks, initialization of Kerberos services and of 
the SDR. These topics are not listed in the chronological order of the CWS 
configuration process. Rather, they are gathered by categories: PSSP 
commands, configuration files, environment requirements, and Ipp 
considerations. 


8.1 Key concepts you should study 

Before taking the RS/6000 SP certification exam, you should understand the 
following CWS configuration concepts: 

• PSSP product packaging: Required and optional Ipps and filesets. 

• Connectivity between the CWS, SP frames, non-SP frames, and SP 
nodes. 

• Storage requirements and directory organization for PSSP software. 

• AIX system configuration files related to the SP system. 

• CWS configuration commands. 

• Setup of Kerberos authentication services. 


8.2 Summary of CWS configuration 

This section presents a summary of the initial configuration of the CWS. It 
corresponds to: 

• Steps 1 to 21 of the PSSP 2.4 PSSP: Installation and Migration Guide , 
GC23-3898, Chapter 2. 

The initial configuration of the CWS is the part of the PSSP installation where 
you prepare the environment before you start configuring the PSSP software. 
It consists of several steps: 

1. You need to update the AIX system environment: You have to modify the 
PATH of the root user, change the maximum number of processes allowed 
by AIX, customize a few system files, such as /etc/services, and check 
that some system daemons are running. 
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2. You must make sure that the AIX system is at the appropriate level (AIX, 
perfagent), and that it matches the prerequisites of the version of PSSP 
you are about to install. 

3. You must check the physical connectivity between the CWS and the SP 
frames and nodes. You cannot start configuring the SP system on the 
CWS until the physical installation is completed. You must then configure 
your TCP/IP network: Addresses, routes, name resolution, tuning of 
network parameters, and so on. The TCP/IP definition of all SP nodes 
must be completed on the CWS before initializing Kerberos services and 
before configuring the SDR. This step is critical to the success of the SP 
system installation. Please refer to Chapter 3, “RS/6000 SP networking” 
on page 75 for more detail on the TCP/IP configuration step. 

4. You must allocate disk space on the CWS for storing the PSSP software, 
and restore it from the distribution media. 

5. You have to install the PSSP on the CWS using the instaiip command. 

6. You must configure authentication services on the CWS either by using 
the Kerberos implementation distributed with PSSP or by using another 
Kerberos implementation. 

7. Finally, you have to initialize the SDR database that will be used to store 
all your SP system configuration information. 

The tasks described in steps one through four can be executed in any order. 

Steps 5, 6, and 7 must be performed in this order after all other steps. 

The following sections describes in more detail the commands, files, and 

concepts related to these seven steps. 


8.3 Key commands 

The commands described in this chapter are to be used only on the CWS and 
not on the SP nodes. 

8.3.1 setup_authent 

This command has no argument. It configures the Kerberos authentication 
services for the SP system. The command, setup_authent, first searches the 
AIX system for Kerberos services already installed, checks for the existence 
of Kerberos configurations files, and then enters an interactive dialog where 
you are asked to choose and customize the authentication method to use for 
the management of the SP system. You can choose to use the SP provided 
Kerberos services, another already existing Kerberos V4 environment, or an 
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AFS based Kerberos service. If you choose the SP provided Kerberos 
services, setup_authent will initialize the primary authentication server on the 
CWS. 

8.3.2 install_cw 

This command has no argument. It is used after the PSSP software has been 
installed on the CWS and after the Kerberos authentication services have 
been initialized. The command, instaii_cw, performs the initial customization 
of PSSP onto the CWS (setup of PSSP SMIT panels, initialization of the SDR, 
and so on), configures the default partition, and starts the SP daemons 
necessary for the following steps of the SP installation. 


8.4 Key files 

Before the installation of PSSP software on the CWS, you have to modify 
several AIX system files. These changes can be done in any order, as long as 
they are done before using the commands: setup_authent and instaii_cw 

8.4.1 .profile, /etc/profile, or /etc/environment 

The root user (during installation) and any user chosen for SP system 
administration (during SP operation) need to have access to the PSSP 
commands. For each of these users, depending on your site policy, one of 
the files, $HOME/.profile, /etc/profile or/etc/environment has to be modified 
so that the PATH environment variable contains the directories where the 
PSSP and Kerberos commands are located. 

For $HOME/.profile or /etc/profile, add the lines: 

PATH=$PATH:/usr/lpp/ssp/bin:/usr/lib/instl:/usr/sbin:\ 

/usr/lpp/ssp/kerberos/bin 

export PATH 

For/etc/environment, add the line: 

PATH=/usr/bin:/etc:/usr/sbin:/usr/ucb:/usr/bin/Xll:/sbin:\ 
/usr/lpp/ssp/bin:/usr/lib/instl:/usr/lpp/ssp/kerberos/bin 


8.4.2 /etc/inittab 

This file is used to define several commands that are to be executed by the 
init command during an RS/6000 boot process. 
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On the CWS, you must make sure that this file starts the AIX System 
Resource Controller (SRC). The srcmstr entry of the CWS /etc/inittab must be 
uncommented and look like: 

srcmstr:2:respawn:/usr/sbin/srcmstr # System Resource Controller 


/etc/inittab is also used to define which PSSP daemons are to be started at 
boot time. It is updated automatically during the PSSP installation with the 
entries sdrd, sp, fsd, hardmon, sysctld, st_swnum, spmgr, kerb, kadm, aplogd, 
swtlog, swtadmd, hats, hags, haem, hr, pman, and sp_configd. 


8.4.3 /etc/inetd.conf 

On the CWS, the inetd daemon configuration must contain the uncommented 
entries bootps and tftp. If they are commented prior to the PSSP installation, 
you must uncomment them manually. The PSSP installation scripts will not 
check or modify these entries. 

8.4.4 /etc/rc.net 

For improving networking performance, you can modify this file on the CWS 
to set network tunables to the values that fit your SP system by adding the 
following lines: 

# additions for tuning of SP-PSSP system 

no -o thewall=16384 

no -o sb_max=163840 

no -o ipforwarding=l 

no -o tcp_sendspace=65536 

no -o tcp_recvspace=65536 

no -o udp_sendspace=32768 

no -o udp_recvspace=65536 

no -o tcp_mssdflt=1448 

The rc.net file is also the recommended location for setting any static routing 
information. In particular, the CWS needs to have IP connectivity to each of 
the SP nodes’ enO adapter during the installation process. In the case where 
the CWS and all nodes enO adapters are not on the same Ethernet segment 
(for example, when there are several frames), the rc.net file of the CWS can 
be modified to include a routing statement. 

For example, in our environment, we would add: 

/usr/sbin/route add -net 192.168.31.0 192.168.3.11 
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8.4.5 /etc/services 


There is a conflict in the use of port 88 by Kerberos V4 (as used by AFS) and 
by Kerberos V5 (assigned to DCE by AIX 4.1). The /etc/services file can be 
used to resolve this problem if you decide to use the AFS authentication 
services by adding the line: 

kerberos4 88/ud.p # Kerberos V4 - added for PSSP 

An alternative solution is to reconfigure the AFS authentication server to use 
another port (750). 


8.5 Environment requirements 

Before starting the installation of the PSSP software onto the CWS, you must 
prepare the hardware and software environment and pay attention to some 
rules and constraints. 

8.5.1 Connectivity 

During normal operation, the TCP/IP connectivity needed for user 
applications between the CWS and the SP nodes can be provided through 
any type of network (Token Ring, FDDI, ATM) supported by the RS/6000 
hardware. However, for the installation and the management of the SP nodes 
from the CWS, there must exist an Ethernet network connecting all SP nodes 
to the CWS. This network may consist of several segments. In this case, the 
routing between the segments is provided either by one (or several) SP 
node(s) with multiple Ethernet adapters, Ethernet hubs, or Ethernet routers. 

Furthermore, the monitoring and control of the SP frames and nodes 
hardware from the CWS requires a serial connection between the CWS and 
each frame in the SP system. If there are many frames, there may not be 
enough build-in serial adapters on the CWS, and additional serial adapter 
cards may need to be installed in the CWS. 

In the case that SP-Attached Servers (RS/6000 S70, S7A or S80) are 
included in the SP system, two serial cables are needed to link the CWS to 
each of the servers. An Ethernet connection is also mandatory between the 
CWS and the server configured on the enO adapter of the server. 

Also, note that the CWS cannot be connected to an SP Switch (no cssO 
adapter in the CWS). 
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The connectivity between the CWS, the frames, and the SP nodes through 
the serial links, and between the CWS and the nodes through the Ethernet 
network, must be set up before starting the PSSP installation. 

8.5.2 Disk space and file system organization 

Before starting installation of the PSSP software, plan the allocation of disk 
space dedicated to the storage of the product code as well as to the archiving 
of the AIX images (mksysb) loaded on each SP node. This disk space is 
organized in a directory structure that must comply with naming conventions. 

8.5.2.1 /spdata size and disk allocation 

Most of the PSSP-related disk storage is allocated within the /spdata 
directory. (A small amount of storage is also needed in /usr and /var.) The 
exact size that must be available on the CWS and on each boot/install server 
can be computed using the formulas in Chapter 3 of RS/6000: Planning 
Volume 2, GA22-7281. The exact size depends on the installation of optional 
filesets and the number of mksysb images to be kept on the CWS. However, 
some rules of thumb can be used for a rough evaluation of the size. For a 
simple configuration, the same system image installed on all nodes and AIX 
with a reasonable number of options, 1.8 GB will be necessary. For a system 
with n images (archiving backup images or using different images for different 
nodes), the size will be in the order of (1100+(n * 700)) MB. Note that the 
same image can be used for uniprocessor or multiprocessor nodes. 

Keep in mind that this rule provides only a very rough estimate. As a point of 
comparison, the minimum system image (spimg) provided with PSSP is 91 
MB versus an estimated 700 MB for the system images considered in this 
rule of thumb. 

It is recommended, but not required, to dedicate a volume group of the CWS 
to the /spdata directory. The decision for creating such a volume group must 
take into account the backup strategy that you will choose. The root volume 
group can be backed up using the mksysb command to create a bootable 
image, while other volume groups can be saved using the savevg command. 
Since there is no need of any file in the /spdata directory for restoring the 
CWS from a bootable image, the /spdata directory does not need to be part of 
the CWS mksysb. Furthermore, the contents of the /spdata directory will 
change when the systems installed on the SP nodes are modified (the 
creation of new node system images). This is likely to be different from the 
time the content of the CWS rootvg changes. The schedules for the backup of 
the CWS volume group and for the /spdata directory will, therefore, be 
generally disjointed. 
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8.5.2.2 /spdata directory structure and naming convention 

You must manually create the /spdata directory before the beginning of the 
PSSP installation with a minimum substructure consisting of the following 
directories shown in Figure 104. 


/spdata/sysl/install/ 

/spdata/sysl/install/< source_name > 

/spdata/sysl/install/< source_name >/lppsource 

/spdata/sysl/install/aix431 

/spdata/sysl/install/aix431/lppsource 

/spdata/sysl/install/aix432 

/spdata/sysl/install/aix43 2/lppsource 

/spdata/sysl/install/images 

/spdata/sysl/install/pssp 

/spdata/sysl/install/pssplpp 

/spdata/sysl/instal1/pssplpp/PSSP-2.2 

/spdata/sysl/install/pssplpp/PSSP-2.3 

/spdata/sysl/install/pssplpp/PSSP-2.4 

/spdata/sysl/install/pssplpp/PSSP-3.1 


Figure 104. /spdata initial structure 

The installable images (Ipp) of the AIX systems must be stored in directories 
named /spdata/sys1/install/<source_name>/lppsource. You can set 
<source_name> to the name you prefer. However, it is recommended to use a 
name identifying the version of the AIX Ipps stored in this directory. The 
names generally used are aix421, aix431, and so on. 

Except for <source_name>, the name of all directories listed in Figure 104 
must be left unchanged. 

In Section 8.5.2.1, “/spdata size and disk allocation” on page 256, we have 
mentioned one possibility of allocation of /spdata based on a backup strategy. 
We now present another possibility based on the contents of the 
subdirectories of /spdata. Instead of dedicating a volume group to /spdata, 
you can spread the information contained in the /spdata directory between 
the rootvg and another volume group (for example, let us call it spstdvg). All 
information that can be easily re-created is stored in spstdvg, while all 
information that is manually created during the installation of the SP system 
is stored in rootvg. The advantage of this solution is to enable the backup of 
critical SP information along with the rest of the AIX system backup using the 
mksysb command, while all information that is not critical can be backed up 
independently with a different backup frequency (maybe only once at 
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installation time). Practically, this implies that you create on the spstdvg 
volume group one file systems for holding each directory: 

/spdata/sys1/install/<source_name> 

/spdata/sysl/install/images 

/spdata/sysl/install/pssplpp 

These directories are then mounted over their mount point in rootvg. 

Another advantage of this solution is that these directories contain most of 
the /spdata information. The remaining subdirectories of /spdata represent 
only around 30 MB. This solution, therefore, enables you to keep the size of 
the rootvg to a reasonable value for creating mksysb bootable images. 


8.6 LPP filesets 

An SP system requires at least the installation of AIX, Perfagent, and PSSP 
Each of these products consists of many filesets, but only a subset of them 
are required to be installed. The following sections explain which filesets 
need to be installed depending on the configuration of the SP system. 

8.6.1 PSSP prerequisites 

The PSSP software has prerequisites on the level of AIX installed on the 
CWS as well as on optional Ipps. These requirements are different for each 
release of PSSP. 

The minimum set of AIX components to be copied to the 
/spdata/sys1/install/<source_name>/lppsource directory is shown is Table 15. 


Table 15. Minimum AIX LPP requirements 


bos 

bos.diag.* 

bos.mp.* 

bos.net.* 

bos.powermgt.* 

bos.sysmgt.* 

bos.terminfo.* 

bos.up.* 

bos.64bit 

devices.* 

xIC.rte.* 

Xll.apps.* 

XII.base.* 

XII.com pat.* 

XII.Dt.* 

Xll.fnt.* 

Xll.loc.* 

XII.motif.* 

Xll.msg.* 

Xll.vsm.* 


For installation on AIX releases earlier or equal to 4.2, you also need to install 
the filesets bos.info.* 


258 


IBM Certification Study Guide RS/6000 SP 




In addition, the right level of perfagent must be installed on the CWS and 
copied to each /spdata/sys1/install/<source_name>/lppsource directory, 
according to Table 16. 

Table 16. Perfagent filesets 


AIX level 

PSSP level 

Required Filesets 

AIX 4.1.5 

PSSP 2.2 

perfagent.server 2.1.5.x 

AIX 4.2.1 

PSSP 2.2 

perfagent.server 2.2.1 .x or greater where x>=2 

AIX 4.2.1 

PSSP 2.3 

perfagent.server 2.2.1 .x or greater where x>=2 

AIX 4.3.1 

PSSP 2.3 

perfagent.server 2.2.31 .x 

AIX 4.3.1 

PSSP 2.4 

perfagent.server 2.31 .x 

AIX 4.3.2 

PSSP 2.3 

perfagent.tools and perfagent.server 2.2.32.x 

AIX 4.3.2 

PSSP 2.4 

perfagent.tools and perfagent.server 2.2.32.x 

AIX 4.3.2 

PSSP 3.1 

perfagent.tools 2.2.32.x 


8.6.2 PSSP filesets 

The installation of the PSSP software on the CWS disks is a two-step 
process. 

1. All PSSP software is first restored from the distribution media into the 
/spdata/sysl/install/psspIpp/PSSP-x.x directory using the bffcreate 
command. 

The ssp.user file must then be renamed into pssp.installp, and the table of 
contents must be regenerated (execute the inutoc command). The exact 
filename of ssp.user depends on the PSSP version: 

PSSP 2.4: ssp.usr.2.4.0.0 

PSSP 3.1: ssp.usr.3.1.0.0 

2. Part of the code of the /spdata/sysl/install/psspIpp/PSSP-x.x directory is 
then installed (instaiip) on the CWS. You must first choose which of the 
PSSP filesets you need to install. 

The filesets to be installed depend on the version of PSSP. 
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8.6.2.1 PSSP 2.4 filesets 

For PSSP 2.4, Table 17, Table 18, Table 19, and Table 20 list the filesets that 
are, respectively, required when using a Switch, when using a Switch Router, 
or optional. 

Table 17. PSSP 2.4 required filesets 


Fileset Description 

Fileset Name 

SP System Support package 

ssp.basic 

Authentication Client Commands 

ssp.clients 

System Monitor and Perspectives 

ssp.gui 

Sysctl 

ssp.sysctl 

Availability subsystems 

ssp.ha 

Topology services 

ssp.topscvs 


Table 18. PSSP 2.4 required filesets (with an SP Switch) 


Fileset Description 

Fileset Name 

Communication subsystem 

ssp.css 

Communication subsystem topology 

ssp.top 


Table 19. PSSP 2.4 required filesets (with an SP Switch router) 


Fileset Description 

Fileset Name 

Extension nodes SNMP manager 

ssp.spmgr 


Table 20. PSSP 2.4 optional packages 


Fileset Description 

Fileset Name 

System management tools (NTP, file 
collection, and so on). 

ssp.sysman 

Resource manager 

ssp.jm 

Public tools 

ssp.public 

PSSP documentation 

ssp.docs 

Authentication Server: This package is 
compulsory if you wish to use the CWS as 
the master Kerberos authentication server 
for the PSSP provided authentication 
method. 

ssp.authent 

System Partitioning Aid 

ssp.top.gui 
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Fileset Description 

Fileset Name 

Job Switch Resource Table Services 

ssp.st 

Perl 

ssp.perlpkg 

Problem management 

ssp.pman 

High Availability Control Workstation 

ssp.hacws 

Performance Monitor 

ssp.ptpegui 

Virtual Shared Disk supports 

ssp.csd.vsd, sp.csd.cmi, ssp.csd.hsd, 
ssp.csd.sysctl, ssp.csd.gui 

Minimal AIX mksysb images: This 
package is only needed if the user does 
not provide his own mksysb for the nodes. 

spimg 

Supervisor Microcode 

ssp.ucode 
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8.6.2.2 PSSP 3.1 filesets 

For PSSP 3.1, Table 21, Table 22, Table 23, and Table 24 describe the filesets 
that are required to support a Switch, to support a Switch router, or optional. 

Table 21. PSSP3 3.1 required filesets 


Fileset Description 

Fileset Name 

Cluster Technology 

rsct.basic, hacmp, rsct.basic, rte, 

rsct.basic.sp, rsct.clients.hacmp, 

rsct.clients.rte, rsct.clients.sp 

SP System Support package 

ssp.basic 

Authentication Client Commands 

ssp.clients 

Compatibility package 

ssp.ha_topcvcs.compat 

Perl 

ssp.perlpkg 

Sysctl 

ssp.sysctl 

System management tools (NTP, file 
collection, and so on). 

ssp.sysman 


Table 22. PSSP 3.1 required filesets (with an SP Switch) 


Fileset Description 

Fileset Name 

Communication subsystem 

ssp.css 

Communication subsystem topology 

ssp.top 


Table 23. PSSP 3.1 required filesets (with an SP Switch router) 


Fileset Description 

Fileset Name 

Extension nodes SNMP manager; 

ssp.spmgr 


Table 24. PSSP 3.1 optional filesets 


Fileset Description 

Fileset Name 

Performance Toolbox Parallel Extensions 

ptpe.docs, ptpe.program 

Minimal AIX mksysb images: This 
package is only needed if you do not 
provide your own mksysb for the nodes. 

spimg 

Authentication Server: This package is 
compulsory if you wish to use the CWS as 
the master authentication server for the 
PSSP provided authentication method. 

ssp.authent 

PSSP Documentation 

ssp.docs, ssp.resctr.rte 
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Fileset Description 

Fileset Name 

Perspectives 

ssp.gui 

High Availability Control Workstation 

ssp.hacws 

Problem Management 

ssp.pman 

Performance Monitor 

ssp.ptpegui 

Public Tools 

ssp.public 

Job Switch Resource Table Services 

ssp.st 

TEC Event Adapter 

ssp.tecad 

System Partitioning Aid 

ssp.top.gui 

Supervisor Microcode 

ssp.ucode 

Virtual Shared Disk support 

vsd.cmi, vsd.hsd, ssp.vsdgui, vsd.sysctl, 
vsd.vsddd 

Recoverable Virtual Shared Disks 

vsd.rvsd.hc, vsd.rvsd.rvsdd, 

vsd.rvsd.scripts 


8.7 Related documentation 

For complete reference and ordering information for the documents listed in 
this section, see Appendix D, “Related publications” on page 501. 

SP manuals 

You can refer to two sets of documents related to either Version 2.4 or 
Version 3.1 of PSSP: 

FIS/6000: Planning Volume 2, GA22-7281. Chapters 2, 3, and 5 provide 
detailed information about the SP connectivity, storage requirements, and site 
information. 

PSSP: Installation and Migration Guide, GC23-3898. Chapter 2 describes in 
detail the installation of the CWS, the PSSP packaging, the system 
configuration files, and the authentication services. 

PSSP: Command and Technical Reference, GC23-3900, for PSSP 2.4 or 
PSSP: Command and Technical Reference, SA22-7351 for PSSP 3.1 
contains a complete description of each CWS installation command listed in 
8.3, “Key commands” on page 252. 
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SP Redbooks 

RS/6000 SP: PSSP 2.2 Survival Guide, SG24-4928. Chapter 2 describes the 
logical flow of steps that make the installation process. 

Inside the RS/6000 SP, SG24-5145. Chapter 5 contains a high-level 
description of the installation process. 


8.8 Sample questions 

This section provides a series of questions to help aid you in preparation for 
the certification exam. The answers to these questions can be found in 
Appendix A. 

1. The install_cw script performs the initial customization of PSSP onto the 
control workstation, configures the default partition, and starts the SP 
daemons necessary for the following steps of the SP installation. Which of 
the following is NOT done by the install_cw script? 

A. Creates RS/6000 SP SMIT panels 

B. Initializes the SDR 

C. Creates and starts the hardmon daemon 

D. Creates and starts the partition-sensitive daemons 

2. Which of the following is NOT a pre-requisite for PSSP 3.1? 

A. AIX 4.3.2 

B. perfagent.server.2.2.32.X 

C. perfagent.tools.2.2.32.X 

D. xIC.rte 

3. Which filesets are a pre-requisite for PSSP 3.1? 

A. rsct.basic and rsct.clients 

B. rsct.basic.sp and rsct.clients.sp 

C. ssp.ha and ssp.clients 

D. ssp.topsvc and ssp.hacws 

4. What is the recommended location for setting any static routing 
information? 

A. rc.network file 

B. rc file 

C. rc.route file 
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D. rc.net file 

5. What type of connection is required between the control workstation and 
each frame in the SP system for monitoring and controlling of the SP 
frames and nodes hardware? 

A. Serial connection 

B. Parallel connection 

C. SAMI connection 

D. ATM connection 

6. Which is true about the setup_authent script? 

A. This command has two arguments. 

B. Checks for the existence of Network configuration files. 

C. Searches the AIX system for Kerberos services already installed. 

D. Selects the authentication method to use for the management of the 
SP system. 

7. Which of the following statements is true when connecting an SP-Attached 
server to the control workstation? 

A. An ATM connection is mandatory. 

B. An Ethernet connection is mandatory. 

C. Four serial cables are needed to link the CWS to the server. 

D. One serial cable is needed to link the CWS to the server. 

8. Which filesets are a prerequisite for PSSP 3.1 with an SP Switch? 

A. ssp.st and ssp.public 

B. ssp.spmgr and ssp.authent 

C. ssp.css and ssp.top 

D. ssp.ucode and ssp.top.gui 

9. Where must the installable images (Ipp) of the AIX systems be stored? 

A. /spdata/sys1/install/<source_name>/lppsource 

B. /spdata/sys2/<source_name>/lppsource 

C. /spdata/sys2/install/<source_name>/lppsource 

D. /spdata/sys1/<source_name>/lppsource 

10. Which of the following are required filesets for PSSP 2.4 and AIX 4.3.2 on 
the control workstation? (Select 2) 
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A. perfagent.server 2.1,5.x 

B. perfagent.server 2.2.1 .x 

C. perfagent.tools 2.2.32.x 

D. perfagent.server 2.2.32.x 


8.9 Exercises 

Here are some exercises you may wish to perform: 

1. On a test system that does not affect any users, perform the migration of 
the CWS to a new version of AIX and PSSP. 

2. On a test system that does not affect any users, modify the network 
tunables on the CWS to the values that fit you SP system. 

3. Refer to the study guide test environment on page 3 for the following 
question: The control workstation and all nodes in frame 1 and frame 2 are to 
be within the same TCP/IP segment. Which netmask makes this possible? 

4. Familiarize yourself with the following key commands: setup_authent and 
instaii_cw. (Note: These commands are to be used only on the control 
workstation and not on the SP nodes.) 

5. Familiarize yourself with the various PSSP and AIX software requirements 
for the control workstation. These requirements are different for each release 
of PSSP and AIX. 
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Chapter 9. Frame and node installation 


In Chapter 8, “Configuring the control workstation” on page 251, we 
presented the initial configuration of the CWS. This chapter addresses all of 
the other steps of the installation of an SP system from the configuration of 
the PSSP software on the CWS through the installation of AIX and PSSP on 
SP nodes up to the first boot of nodes and switches. 


9.1 Key concepts you should study 

Before taking the RS/6000 SP certification exam, you should understand the 
following frame, nodes, and switch installation concepts: 

• Structure of the SDR configuration information: Site information, frame 
information, and node information 

• Contents of the predefined subdirectories of/spdata 

• Files used for SDR configuration and SP frames, nodes, and switches 
installation 

• NIM concepts applied to the SP environment 

• Setup of boot/install servers (primary and secondary) 

• Network-installation concepts 

• Automatic and manual node conditioning 

• SP system customization 

• SP partitioning and its impact on commands and daemons 


9.2 Installation steps and associated key commands 

This section presents the commands most widely used during an SP system 
configuration and installation. 

To help you understand the use of each command, they are presented in 
association to the installation step in which they are performed and in the 
order in that they are first used during the installation process. Some of these 
commands may be used several times during the initial installation and the 
upgrades of an SP system. In this case, we also provide information that is 
not related to the installation step but that you may need at a later stage in 
the life of your SP system. 
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Finally, this section is not intended to replace the SP manuals referenced in 
9.4, “Related documentation” on page 293. You should refer to the these 
manuals to get a thorough understanding of these commands before taking 
the SP certification exam. 

9.2.1 Enter site environment information 

At this stage, we suppose that the PSSP software has been loaded on the 
CWS and that the SDR has just be initialized (the last command executed on 
the CWS was instaii_cw) . We are now at the beginning of the SP system 
customization and installation. 

The first task is to define in the SDR the site environment data used by the 
installation and management scripts. This can be done using the command 
line interface: spsitenv, or its equivalent SMIT window: Site Environment 
Information window (smitty site_env_diaiog). This must be executed on the 
CWS only. 

The spsitenv command defines all site-wide configuration information (name 
of the default installable image, NTP, and so on) and which of the optional 
PSSP features will be used (SP User Management, SP File Collection 
Management, SP Accounting). 

Because of the number of parameters you must provide in the spsitenv 
command, we recommend that you use the SMIT interface rather than the 
command line. 

In our environment, the site configuration is defined as shown in Figure 105 
on page 269. 
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Site Environment Information 

Type or select values in entry fields. 

Press Enter AFTER making all desired changes. 

Default Network Install Image 

[Entry Fields] 

[bos.obj.ssp.432] 


Remove Install Image after Installs 


false 

+ 

NTP Installation 

NTP Server Hostname(s) 


consensus 

[""] 

+ 

NTP Version 


3 

+ 

Automounter Configuration 


true 

+ 

Print Management Configuration 

Print system secure mode login name 


false 

[""] 

+ 

User Administration Interface 

Password File Server Hostname 

Password File 

Home Directory Server Hostname 

Home Directory Path 


true 

[sp3en0] 

[/etc/passwd] 

[sp3en0] 

[ /home/ sp3en0 ] 

+ 

File Collection Management 

File Collection daemon uid 


true 

[102] 

+ 

File Collection daemon port 


[8431] 

# 

SP Accounting Enabled 


false 

+ 

SP Accounting Active Node Threshold 


[80] 

# 

SP Exclusive Use Accounting Enabled 
Accounting Master 

Control Workstation LPP Source Name 


false 

[0] 

[aix432] 

+ 

Fl=Help F2=Refresh 

F3-Cancel 

F4=List 


F5-Reset F6-Conmand 

F9=Shell F10=Exit 

F7=Edit 

Enter-Do 

F8-Image 



Figure 105. Site environment information 


9.2.2 Enter frame information 

After defining the site environment in the SDR, you must describe in the SDR 
the frames existing in your SP system and how they are numbered. This 
command is used to enter frame configuration information into the SDR: 
Association between the frame number and the tty port on the CWS to which 
the frame is attached through a serial link. 

This task is performed using either the command line interface: spframe, or its 
SMIT equivalent windows: SP Frame Information (smitty sp_frame_diaiog) 
and non-SP Frame Information (smitty nonsp_frame_diaiog). This task must 
be executed on the CWS only. 
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Since PSSP 3.1, this command also defines the hardware protocol used on 
the serial link (SP for SP nodes, SAMI for SP-Attached Servers) and the 
switch port to which a non-SP frame is attached. 

This command must be performed during the first installation of an SP system 
and also each time a new frame is added to the system. By specifying the 
start_frame argument for each frame in the SP system, it is possible to skip 
the frame number and to leave room for system growth and later addition of 
frames in between the frames installed originally in the system. 

In our environment, we define the first frame using: 

spframe -r yes 1 1 /dev/ttyO 

The second frame will be defined later in Chapter 15, “RS/6000 SP 
reconfiguration and update” on page 405. 

9.2.3 Check the level of supervisor microcode 

Once the frames have been configured, and before starting to configure the 
nodes, we recommend to check that the frame microcode, known as 
supervisor code, is at the latest level supported by the PSSP being installed. 
The PSSP software contains an optional fileset: ssp.ucode, which must have 
been installed on the CWS to perform this operation. 

The spsvrmgr command manages the supervisor code on the SP frames. It 
executes on the CWS only. It can be called from the command line or from 
SMIT. Each of the command line option is equivalent to one of the functions 
accessible from the SMIT RS/6000 SP Supervisor Manager window. 

It can be used to query the level of the supervisor code or to download 
supervisor code from the CWS onto the SP frame. We recommend to use the 
SMIT panels to perform these operations. However, two commands can be 
used for system wide checking and update: 

• spsvrmgr -G -r status all 

indicates if the supervisor microcode is up to date or needs to be 
upgraded. 

• spsvrmgr -G -u all 

updates the supervisor microcode on all parts of the SP system. 

Since the -u option usually powers off the target of the spsvrmgr command, 
it is highly recommended to upgrade the SP system at the beginning of the 
SP system installation rather than later when the system is in production. 
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9.2.4 Check the previous installation steps 

Since the complete SP system installation is a complex process involving 
more than 50 steps, it is a good idea to perform some checking at several 
points of the process to insure that already executed steps were successful. 
Chapter 10, “Verification commands and methods” on page 297 is addressing 
the various aspect of SP system checking. We will, therefore, not mention the 
actions to perform at each checkpoint in this chapter but only the most useful 
command: spistdata 

This command executes on the CWS or any SP node when using the 
command line interface. It can only be used on the CWS when called from 
one of the SMIT windows accessible from the List Database Information 
window (smitty list_data). 

This command displays configuration information stored in the SDR about the 
frames and nodes. 

This command has many options. During the SP installation, the most useful 
ones are: 

-nfor node general configuration 
-b for node boot/installation configuration 
-a for node adapters configuration 
-f for frame information 

At this point in the installation, you can use spistdata -f and spistdata -n to 
verify that the frames have been correctly configured in the SDR and that the 
execution of the spframe command has correctly discovered the nodes in 
each frame. 

9.2.5 Define the nodes Ethernet information 

Once the frame information has been configured, and the microcode level is 
up-to-date, you have to define in the SDR the IP addresses of the enO 
adapters of each of the SP nodes as well as the type and the default route for 
this Ethernet adapter. 

This task is performed by the spethemt command, which executes only on 
the CWS, on the command line, or through its equivalent SMIT window: SP 
Ethernet Information (smitty sp_eth_diaiog). 

The spethemt command can define adapters individually, by group, or by 
ranges. 
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In our environment, we need to use the command three times to define all 
adapters since the enO adapter of node 1, the enO adapters of nodes 5 to 9, 
and the enO adapters of nodes 10 to 15 are in three different ranges of the 
sequential Ethernet IP address and because the default route is different for 
node 1 than for other nodes: 

spethemt -s yes -t bnc 1 1 192.168.3.11 192.168.3.130 
spethemt -s yes -t bnc 5 5 192.168.31.15 192.168.31.11 
spethemt -s yes -t bnc 10 6 192.168.31.110 192.168.31.11 

Figure 106 shows which adapters are defined by each of these commands. 
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>spethernt-s yes-t bnc 1 1 192.168.3.11 192.168.3.130' 
>spethernt -s yes -t bnc 5 5 192.168.31.15 192.168.31.11 - 
>spethernt -s yes -t bnc 10 6 192.168.31.110 192.168.31.11 


192.168.31 .xx 



15 sp3n15 

en0 192.168.31.115 

13 

sp3n13 

enO 

14 

sp3n14 

enO 

11 sp3n11 

enO 

12 

sp3n12 

enO 

9 sp3n09 

enO 

10 

sp3n10 

K enO 

7 

enO Sp3n0? 

8 

sp3n08 

enO 

enO Sp3n05 

sp3n06 6 

enO 
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enl 192.168.31.11 

sp3n01 
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SWITCH 



Frame 1 


192.168.3.XX 


sp3en0 


192.168.3.130 



Figure 106. Definition of additional adapters 
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9.2.6 Discover or configure the Ethernet hardware address 

Once the nodes enO IP address are known, the SDR must be loaded with the 
Hardware (MAC) address of these enO adapters for future use by the bootp 
protocol during the installation of the AIX image onto each node through the 
network. 

This task is performed by sphrdwrad, only on the CWS, as a command or by 
using the SMIT Get Hardware Ethernet Address window (smitty 

hrdwrad_dialog). 

You can provide this information, if you already know it, by creating a file 
/etc/bootptab.info (for more details, see 9.3.1, “/etc/bootptab.info” on page 
285) to speed-up the sphrdwrad command. For each node in the argument list 
of the command, sphrdwrad will look if it finds its hardware address in the 
/etc/bootptab.info. If it cannot find it, it will then query the node through the 
hardware connection to the frame (serial link). In the latter case, the node will 
be powered-down and powered-up. 

-Note- 

Do not use the sphrdwrad command on a running node since it will be 
powered off. 


In our environment, we can use either command to discover the enO adapter 
hardware addresses: 

sphrdwrad 1 1 rest 


or 


sphrdwrad 1 1 12 


9.2.7 Configure additional adapters for nodes 

In addition to the enO adapters, SP nodes can have other adapters used for 
IP communication: A second Ethernet adapter for connecting to a corporate 
backbone or to another segment of the SP Ethernet administrative network, 
Token Ring adapters, and so on. 

The spadaptrs command is used to configure these additional adapters into 
the SDR. It executes on the CWS only using the command line interface or 
the equivalent functions accessible from the SMIT Additional Adapter 
Database Information window (smitty add_adapt_diaiog). 

The spethemt command configures the enO adapter, and the spadaptrs 
command is its counterpart for the other adapters. Similar to the spethemt 
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command, you can configure with spadaptrsthe IP address of individual 
adapters or range of adapters; you can specify the type of adapter (Ethernet, 
Token Ring, and so on), and you can specify the subnet mask associated with 
the adapter, and so on. 

Only Ethernet, Token Ring, FDDI, and Switch (cssO) adapters can be 
configured using spadaptrs. Other types of adapters (ATM, ESCON) can not 
be configured this way. You must either configure them manually after the 
nodes are installed or write configuration code for them in the shell 
customization script firstboot.cust (See “firstboot.cust” on page 290). 

For the switch adapters, two options, -a and -n, allow to allocate IP addresses 
to switch adapters sequentially based on the switch node numbers. 

In our environment, we only need to define the second Ethernet adapter of 
node 1 and the switch adapters (cssO) of all SP nodes: 

spadaptrs -s yes -n no -a yes 111 enl 192.168.31.11 255.255.255.0 
spadaptrs -s yes -n no -a yes 1 1 12 cssO 192.168.13.1 255.255.255.0 


9.2.8 Assign initial host names to nodes 

Once the SDR contains all IP information about the adapters of all nodes, you 
can change the host name of the nodes, also known as initial host name. 

This optional step is performed using sphostnam, on the CWS only, as a 
command or through the SMIT Hostname Information window (smitty 

hostname_dialog). 

The default is to assign the long symbolic name of the enO adapter as the 
host name of the node. If your site policy is different (for example, you may 
want to give to the node, as host name, the name of the adapter connected to 
your corporate network), you use sphostnam to change the initial host name. 
Again, like the previous one, this command applies either to one node or to a 
range or list of nodes. 

In our environment, we only want to change the format of the name and use 
the short names but keep the enO adapter name as host name: 

sphostnam -f short 1 1 12 


9.2.9 Create authorization files 

This step is only executed in PSSP 3.1. There is no equivalent step in PSSP 
2.4. 
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You now have to create the appropriate authorization files for use by root’s 
remote commands, such as rep, rsh, and so on, on the CWS. Possible 
methods are Kerberos4 and AIX standard authentication. 

On the CWS, you can use either spsetauth or the SMIT Select Authorization 
Methods for Root access to Remote Commands window (smitty spauth_rcmd). 

In our environment, we configured both Kerberos 4 and AIX standard 
methods for the default partition (sp3eno): 

spsetauth -d -p sp3en0 k4 std 


9.2.10 Enable selected authentication methods 

This step is only executed in PSSP 3.1. There is no equivalent step in PSSP 
2.4. 

You can now choose the authentication methods used for System 
Management tasks. Valid methods are Kerberos 4, Standard AIX, and 
Kerberos 5 (DCE). 

You perform this task only on the CWS using either chauthpar or the SMIT 
Select Authorization Methods for Root access to Remote Commands window 

(smitty spauth_methods). 

In our environment, we wish to use both Kerberos 4 and AIX standard 
methods in the default partition: 

chauthpar k4 std 


9.2.11 Start system partition-sensitive subsystems 

After the SDR has been loaded with the frame and nodes information, IP 
addresses, symbolic names, and routes, you have to add and start all 
subsystems in the default partition. 

The syspar_ctri command controls the system partition sensitive subsystems 
on the CWS and on the SP nodes. At this point, it is used only on the CWS 
since the nodes are still not installed: 

syspar_ctrl -A 

This command will start the following daemons: hats, hags, haem, hr, pman, 
emon, speonfigd, emcond, and spdmd (optional). 

Execution of this command on the CWS only starts the daemons on the CWS 
and not on any SP node. Since the daemons need to execute on all machines 
of the SP system for the subsystem to run successfully, syspar_ctri -a must 
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also be executed on each node when it is up. This is performed automatically 
at reboot time by the /etc/rc.sp script. 

9.2.12 Set up nodes to be installed 

This step is different in PSSP 2.4 and PSSP 3.1. 

In PSSP 2.4, the spbootins completely perform this task, while in PSSP 3.1, 
the task is split between the spchvgobj and the spbootins command. Sections 
9.2.13, “spchvgobj” on page 277 and 9.2.14, “spbootins” on page 278 
describe the functions performed by the commands in each case. 

9.2.13 spchvgobj 

The spchvgobj command executes on the CWS only. 

It is equivalent to the SMIT Change Volume Group Information window 

(smitty changevg_dialog). 

This command is only available in PSSP 3.1. It has been added as part of the 
new PSSP feature, which has the possibility of having several bootable 
volume groups. The boot/install configuration, that was, up to PSSP 2.4, 
specific of a node, is now specific to a volume group. 

The PSSP installation scripts use a default configuration for the boot/install 
servers, the AIX image (mksysb) that will be installed on each node, and the 
disk where this image will be installed. This default is based on information 
that you entered in the Site Environment Information panel. The default is to 
define as the boot/install server(s): 

• The CWS for a one frame system 

• The CWS and the first node of each frame in a multi-frame system 

The default is to use rootvg as the default bootable volume group on hdiskO. 

If you wish to use a different configuration, you can use the spchvgobj 
command or its SMIT equivalent to specify for a set of SP nodes, and for a 
bootable volume group, the names of disk(s), where to install the AIX image, 
the number of mirrored disks, the name of the boot/install server, where to 
fetch the AIX image, the name of the installable image, the name of the AIX 
Ipp source directory, and the level of PSSP to be installed on the nodes. 

In our environment, since at the time of the installation of the first frame, we 
already plan for adding a new frame, we want to force node 5 to 15 to point to 
node 1 as the boot/install server. We can, therefore, use: 

spchvgobj -n 1 1 5 11 
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where -n 1 indicates that node 1 is the server, and 15 11 indicates that 11 
nodes, starting from frame 1 node 5, will point to this server. 

9.2.14 spbootins 

The spbootins command executes on the CWS only. 

It is equivalent to the SMIT Boot/lnstall Server Information window (smitty 

server_dialog). 

This command behaves differently in PSSP 2.4 and PSSP 3.1. 

9.2.14.1 spbootins in PSSP 2.4 

In PSSP 2.4, the boot/install server configuration is associated to each node. 
The use of spbootins during installation is optional. You can use it to change 
the default boot/install configuration for a set of nodes, the names of disk(s) 
where to install the AIX image, the name of the boot/install server, where to 
fetch the AIX image, the name of the installable image, the name of the AIX 
Ippsource directory and the level of PSSP to be install on the nodes, and how 
they will perform their next boot (from a server or from their disk). 

If our environment was running under PSSP 2.4 instead of PSSP 3.1, then we 
would have used: 

spbootins -n 1 1 5 11 

to perform the same customization as in the example given in 9.2.13, 
“spchvgobj” on page 277. 

9.2.14.2 spbootins in PSSP 3.1 

As mentioned in 9.2.13, “spchvgobj” on page 277, most of the boot/install 
server configuration in PSSP 3.1 is associated with a volume group. The 
spbootins command is used to define in the SDR a set of SP nodes, the 
volume group on which they will boot, and how they will perform their next 
boot (from a server or from their disk). 

In our environment, we use: 

spbootins -s no -r install 1 1 12 

to specify that, at their next reboot, all nodes in frame one are to load the AIX 
image from their respective boot/install server and to ask not to run 
setup_server. 
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9.2.15 Configure the CWS as boot/install server 

The SDR now contains all required information to create a boot/install server 
on the CWS. 

The setup_server command configures the machine where it is executed 
(CWS or SP node) as a boot/install server. This command has no argument. 
It executes on the CWS and any additional boot/install servers. It is 
equivalent to clicking on Run setup_server Command in the SMIT Enter 
Database Information window (smitty enter_data). 

At this point, only the CWS will be configured since the other nodes are still 
not running. 

On the CWS, this command could have been executed automatically if you 
had specified the -s yes option when running spbootins. 

Since we did not use this option previously in our environment, we have to 
execute setup_server. 

On an additional boot/install server node, setup_server is automatically 
executed immediately after installation of the node if it has been defined as a 
server during the SDR configuration. Since this step can take a long time to 
complete, we recommend that after the server node installation, you check 
the /var/adm/SPIogs/sysman/<node>.console.log file. It will contain 
information about the progress of the setup_server operation. This operation 
must be successfully completed before you try to install any client node from 
the server. 

The command setup_server is a complex Perl program. It executes a series of 
configuration commands, called wrappers, that performs various tasks, such 
as configuring PSSP or setting the NIM environment. Here is a simplified 
control flow sequence of the setup_server command (if present, the text in 
italic is the name of the wrapper actually performing the action associated 
with the step): 

1. Get information from SDR and ODM. 

2. Check prerequisites. 

3. Configure PSSP services on this node: services_config 

4. If running on CWS, then perform CWS-specific tasks: setup_cws 

5. Get an authentication ticket: kinit 

6. If running on a NIM master, but not a boot/install server, then unconfigure 
NIM and uninstall NIM filesets: delninnmast -1 <node number> 
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7. If not running on the CWS or boot/install server, then exit. 

8. If any NIM clients are no longer boot/install clients, then delete them from 
the NIM configuration database: delnimclient -s <server_node_numb> 

9. Make this node a NIM master: mknimmast -1 <node_number> 

10. Create tftp access and srvtab files on this master: create_krb_fiies 

11. Make NIM interfaces for this node: mknimint -1 <node_number> 

12. Make the necessary NIM resources on this master: 

mknimres -1 <node_number> 

13. Make NIM clients of all of this node's boot/install clients: 

mknimclient -1 <client_node_list> 

14. Make the configjnfo files for this master’s clients: mkconfig 

15. Make the installjnfo files for this master’s clients: mkinstaii 

16. Export pssplpp file system to all clients of this server: export_ciients 

17. Allocate the necessary NIM resources to each client: 

allnimres -1 <client_node_list> 

18. Remove the authentication ticket. 

9.2.16 Set the switch topology 

If a switch is part of the SP system, you now have to store the switch topology 
into the SDR. 

Sample topology files are provided with PSSP in the /etc/SP directory. These 
samples correspond to most of the topologies used by customers. If none of 
the samples match your real switch topology, you have to create one using 
the partitioning tool provided with PSSP (System Partitioning Aid available 
from the Perspectives Launch Pad). Once this file is created, it must be 
annotated and stored in the SDR (here, annotated means that the generic 
topology contained in the sample file is customized to reflect information 
about the real switch connections using the information stored in the SDR). 

This task is performed using the Eannotator and Etopoiogy commands on the 
CWS or by using the equivalent SMIT Topology File Annotator (smitty 
annotator) and Store a Topology File windows (smitty etopology_store). 

In our environment, since we have one Node Switch Board and no 
Intermediate Switch Board, we use: 

Eannotator -F /etc/SP/expected.top.lnsb.Oisb.0 -f /etc/SP/expected.top/annotated -O no 
Etopoiogy /etc/SP/expected.top/annotated 
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9.2.17 Verify the switch primary and primary backup nodes 

After choosing the switch topology, you can change the default primary and 
primary backup nodes using the Eprimary command or the SMIT Set 
Primary/Primary Backup Node window (smitty primary_node_diaiog). 

Assuming that, in our environment, we wish to use node 5 instead of the 
default value, node 1, as the primary node, we use: 

Eprimary -init 5 


9.2.18 Set the clock source for all switches 

The last step in the configuration of the Switch is to choose a clock source for 
all switches and to store this information in the SDR. This is done using the 
Eciock command on the CWS. 

Sample clock topology files are provided in the SDR. You can choose to use 
one of them or let the system decide for you. 

In our environment, and since there is only one switch, we let Eciock 
automatically make the decision: 

Eciock -d 


9.2.19 Network boot the boot/install server nodes 

After configuring the switch, we are finally ready to install the SP nodes. This 
operation is two-fold. In the first stage, all additional boot/install servers are 
installed through the Ethernet network from the CWS. In the second stage, all 
remaining nodes are installed from their boot/install servers. 

In normal cases, the installation of a node requires that you open two shell 
windows on your CWS display. One will be used to monitor the execution of 
the installation using the siterm command, while the other one is used to 
initiate the installation using the nodecond command. These two commands 
execute in parallel. We present them sequentially in 9.2.20, “siterm” on page 
281 and 9.2.21, “nodecond” on page 282. 

9.2.20 siterm 

The siterm command executes on CWS only. 

The siterm command opens a connection to the SP node serial port. Since 
the node console is, by default, associated to this port, siterm provides a 
remote console access to the SP node from the CWS through the serial link. 
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It is, therefore, a very useful command to take control of a node when the IP 
connection through the Ethernet network is not available. 

By default, siterm provides a read-only connection. If you wish to enter 
commands on the node, you need to establish a read-write connection by 
using the -w option. 

During installation, the nodecond command needs write access to the node on 
the serial link. The write access cannot be shared by several clients. You 
must, therefore, only open a read-only connection to monitor the node 
installation and see all messages displayed on the node console. 

In our environment, we first boot the node 1, which is a boot/install server. 
We, therefore, open a connection on this node: 

siterm 1 1 

After the boot/install servers have successfully been installed, you can start 
the installation of the other nodes. To monitor this installation, you can open a 
parallel one siterm session to each of these nodes 

9.2.21 nodecond 

The nodecond command executes on CWS. 

This is equivalent to clicking on Run setup_server Command in SMIT Enter 
database Information window (smitty enter_data). 

In parallel with the siterm window, you can now initiate the boot and system 
installation on the target node. This phase is called node conditioning, and it 
is executed by the nodecond command. It is executed on the CWS for all nodes 
even if their boot/install server is not the CWS. 

Once started, this command does not require any user input. It can, 
therefore, be started as a shell background process. If you have several 
nodes to install from the control workstation, you can start one nodecond 
command for each of them from the same shell. However, for performance 
reasons, it is not recommended to simultaneously install more than eight 
nodes from the same boot/install server. 

In our environment, we first need to install node 1 and start the command in 
the background: 

nodecond 1 1 & 

After all boot/install server has successfully been installed, you can condition 
the remaining nodes of your SP system. 
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In our environment, we would perform: 

for i in 5 6 7 8 9 10 
do 

nodecond 1 $i & 
done 

the wait for completion since they all boot from node 1: 

for i in 11 12 13 14 15 
do 

nodecond 1 $i & 
done 


It is also possible to perform the installation using the Perspectives graphical 
user interface. In the Hardware Perspective window, you select the node you 
want to install, and then you need only to click on the Network Boot... item of 
the Action menu. 

Manual node conditioning 

In some cases, you may want to perform the node installation manually using 
nodecond rather than automatically. Manual node conditioning is a more 
complex task consisting of several steps. Since it highly depends on the 
hardware type of the node, these steps differ for each category of nodes. 

For a Thin or Wide node (non-SMP), using either spmon -g (PSSP 2.4) or 
Hardware Perspective (PSSP 3.1) and a siterm -w window, the steps are: 

1. Power off the node (if it is powered on). 

2. Put the key in Secure mode. 

3. Power on the node. 

4. When the led gets to 200, put the key in Service mode. 

5. Reset the node. 

6. When the led reaches 260 or 262, the Main menu is displayed. Select 
option 1 Select Boot Device. 

7. In the Select Boot (startup) Device menu, select your network adapter 
to boot from. 

8. You will be prompted to enter IP addresses for the client (the node to 
be installed), the server (the boot/install server for the node being 
installed), and a gateway (which you may leave empty). 

9. Return to the main menu. 

10.Select option 3 to send a test transmission ping between the client and 
the server. 

11 .Return to the main menu and select option 4 to start the system boot. 
12.At this point, make sure the key is in Normal mode before the 
installation finishes by using the command: 

spmon -key normal <node> 
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For a High Node, the steps are: 

1. Power the node off using spmon. 

2. Set the key service by invoking spmon. 

3. Open a tty connection using a read/write siterm. 

4. Press Enter to get the BUMP prompt and type sbb. 

5. Choose option 1 Set Flags from the STAND-BY MENU. 

6. Verify the if Fast IPL is enabled. If it is disabled, enable it. 

7. Select x until the > prompt reappears. 

8. Power on the node by using spmon. 

9. Wait for the Maintenance Menu to appear. 

10.Select option 6 System Boot. 

11 .In the System Boot Menu, select option 1 Boot from network. 

12.Wait for Main Menu of the boot install and select option 4 Exit Main 
Menu and Start System (BOOT). 

For a CHRP Node (SMP Thin and Wide), perform the following steps: 

1. Power the node off by invoking spmon. 

2. Open a tty connection using a read/write siterm. 

3. Power on the node using spmon. 

4. Wait for a panel to appear as shown in Figure 107. 
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Figure 107. Boot screen 

5. Type i after scsi appears on the screen. 

6. When SMS Main Menu screen appears, select option 3 Utilities. 
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7. When the Utilities Menu appears, choose option 4 Remote Initial 
Program Load Setup. 

8. In the Network Parameters Menu, choose option 2 Adapter 
Parameters. 

9. In the Adapter Parameters Menu, select the appropriate adapter 
parameters. 

10.Select X to exit menus until you get the SMS Main Menu. 

11 .Select 2 to go to the Multiboot Menu. 

12. When the Multi Menu appears, select option 4 Select Boot Devices. 

13. When the Boot Device Menu appears, select option 3 Configure the 
first boot device. 

14.In the Configure Boot Device Menu, select the appropriate network 
interface. 

15.Select X until the SMS Main Menu appear. 

16.Select X to start the booting process. 

9.2.22 Check the system 

At this point, we recommend that you check the SP system using the 
SYSMAN_test command (see 10.3.1.7, “Checking Sysman components: 
SYSMAN_test” on page 299). 

9.2.23 Start the switch 

Once all nodes have been installed and booted, you can start the switch. This 
is performed using the Estart command on the CWS or clicking on Start 
Switch in the SMIT Perform Switch Operation Menu. 


9.3 Key files 

As for the commands presented previously, this section only presents the 
major system files used by PSSP. 

9.3.1 /etc/bootptab.info 

The bootptab.info file specifies the hardware (MAC) address of the enO 
adapter of SP nodes. It is used to speed-up the execution of the sphrdwrad 
command. Each line contains the information for one node and is made of 
two parts: The node identifier and the MAC address. 

The node identifier can be either the node number or a pair 
<frame_number>,<slot> separated with a comma with no blanks. 
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The MAC address is separated from the node identifier by a blank. It is 
formatted in hexadecimal with no . orThe leading 0 of each part of the MAC 
address must be present. 

In our environment, the /etc/bootptab.info file could be the example shown in 
Figure 108. 


> cat /etc/bootptab.info 

I 02608CF534CC 
5 10005AFA13AF 
1,610005AFA1B12 
1,7 10005AFA13D1 

8 10005AFA0447 

9 10005AFA158A 

10 10005AFA159D 

II 10005AFA147C 

1.12 10005AFA0AB5 

1.13 10005AFA1A92 

1.14 10005AFA0333 

1.15 02608C2E7785 


Figure 108. Example of /etc/bootptab.info 

1,14 10:0:5A : fa:03:33 is not a valid entry even if the second string is a usual 
format for MAC addresses. 

9.3.2 /tftpboot 

The /tftpboot directory exists on the CWS, the boot/install server, and on the 
SP client nodes. 

On the CWS, and other boot/install servers, this directory is used as a 
repository for files that will be distributed to the client nodes during their 
installation. On the client nodes, the directory is used as a temporary storage 
area where files are downloaded from the boot/install server /tftpboot 
directory. 

The customization of the boot/install server (setup_server command) creates 
several files in /tftpboot: 

• <spot_name>.<archi>.<kernel_type>.<network> 

• <hostname>-new-srvtab 
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• <hostname>.config_info 

• <hostname>.install_info 

You can also manually add customization scripts to the /tftpboot directory: 

• tuning.oust 

• script.cust 

• firstboot.cust 

In our environment, the /tftpboot directory of the CWS contains the files listed 
in Figure 109. 


[sp3en0:/]# Is -al /tftpboot 


total 7722 




drwxrwxr-x 

3 root 

system 

512 Dec 15 15:33 . 

drwxr-xr-x 

22 bin 

bin 

1024 Dec 15 14:58 .. 

-rw-r--r-- 

1 bin 

bin 

11389 Dec 03 12:03 firstboot.cust 

drwxrwx- 

2 root 

system 

512 Nov 12 16:09 lost+found 

-r- 

1 nobody 

system 

118 Dec 12 13:15 sp3n01-new-srvtab 

-rw-r--r-- 

1 root 

system 

254 Dec 12 13:15 sp3n01.msc.itso.ibm.com.config info 

-rw-r--r-- 

1 root 

system 

795 Dec 12 13:15 sp3n01.msc.itso.ibm.com.install info 

-rw-r--r-- 

1 root 

system 

3928595 Dec 15 15:33 spot aix432.rs6k.mp.ent 

-rw-r--r-- 

1 root 

sys 

2250 Nov 13 13:59 tuning.cust 

[sp3en0:/] # 




Figure 109. Contents of the CWS /tftpboot directory 

We will now describe in more detail the role of each of these files. 

9.3.2.1 <spot_name>.<archi>.<kernel_type>.<network> 

Files with this format of name are bootable images. The naming convention 
is: 

• <spot_name> Name of the spot from which this bootable image has been 
created. It is identical to the name of a spot subdirectory located under 
/sdpata/sys1/install/<aix_level>/spot. In our environment, the spot name is 
spot_aix432. 

• <archi> is the machine architecture that can load this bootable image. It is 

one of rs6k, rspc, or chrp. 

• <kernel_type> refers to the number of processors of the machine that can 
run this image. It is either up for a uniprocessor or mp for a multiprocessor. 

• <network> depends on the type of network adapter through which the 
client machine will boot on this image. It can be ent, tok, fddi, or generic. 
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These files are created by the setup_server command. Only the images 
corresponding to the spot_name, architecture, and kernel type of the nodes 
defined to boot from the boot/install server will be generated, not all possible 
combinations of these options. 

For each node, the tftpboot directory will contain a symbolic link to the 
appropriate bootable image. You can see an example of this in Figure 109 on 
page 287 where this file is called spot_aix432.rs6k.mp.ent. 

9.3.2.2 <hostname>-new-srvtab 

These files are created by the create_krb_fiies wrapper of setup_server. 
<hostname> is the reliable host name of an SP node. For each client node of 
a boot/install server, one such file is created in the server /tftpboot directory. 

This file contains the passwords for the rcmd principals of the SP node. Each 
SP node retrieves its <hostname>-new-srvtab file from the server and stores 
it in its /etc directory as krb-srvtab. 

9.3.2.3 <hostname>.install_info 

These files are created by the mkinstaii wrapper of setup_server. 
<hostname> is the reliable host name of an SP node. For each client node of 
a boot/install server, one such file is created in the server /tftpboot directory. 

This file is a shell script containing mainly shell variables describing the node 
enO IP address, host name, boot/install server IP address, and hostname. 

After the node AIX image has been installed through the network, the 
pssp_script script downloads the <hostname>.install_info file into its own 
/tftpboot directory, and it executes this shell to define the environment 
variable it needs to continue the node customization. 

This file is also used by other customization scripts like psspfb_script. 

9.3.2.4 <hostname>.config_info 

These files are created by the mkconfig wrapper of setup_server. <hostname> 
is the reliable host name of an SP node. For each client node of a boot/install 
server, one such file is created in the server /tftpboot directory. 

This file contains node configuration information, such as node number, 
switch node information, default route, initial hostname, and CWS IP 
information. 
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After the pssp_script script has executed the <hostname>.install_info scripts, 
it downloads the <hostname>.config_info file into the node /tftpboot directory 
and configures the node using the information in this file. 

9.3.2.5 tuning.cust 

The tuning.cust file is a shell script that sets tuning options for IP 
communications. A default sample file is provided with PSSP in 
/usr/lpp/ssp/samples/tuning.cust. Three files are also provided that contain 
recommended settings for scientific, commercial, or development 
environments (in /usr/lpp/ssp/install/config). 

Before starting the installation of the nodes, you can copy one of the three 
pre-customized files into the /tftpboot directory of the CWS, or you can 
provide your own tuning file. Otherwise, the default sample will be copied to 
/tftpboot by the installation scripts. 

During the installation of additional boot/install servers, the tuning.cust file will 
be copied from the CWS /tftpboot directory to each server /tftpboot directory. 

During the installation of each node, the file will be downloaded to the node. It 
is called by the /etc/rc.net file; so, it will be executed each time a node 
reboots. 

You should note that tuning.cust sets ipforwarding=1. So, you may want to 
change this value for nodes that are not IP gateways directly in the 
/tftpboot/tuning.cust on the node (not on boot/install servers). 

9.3.2.6 script.cust 

The script.cust file is a shell script that will be executed at the end of the node 
installation and customization process before the node is rebooted. The use 
of this file is optional. It is a user provided customization file. You can use it to 
perform additional customization that requires a node reboot to be taken into 
account. 

Typically, this script is used to set the time zone, modifying paging space, 
and so on. It can also be used to update global variables in the 
/etc/environment file. 

A sample script.cust file is provided with PSSP in /usr/lpp/ssp/samples. If you 
want to use this optional script, you must first create it in the /tftpboot 
directory of a boot/instal server by either providing your own script or copying 
and modifying the sample script. During node installation, the file is copied 
from the boot/install server onto the node. 
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You can either create one such file in the /tftpboot of the CWS, in which case 
it will be used on all nodes of the SP system, or you can create a different one 
in the /tftpboot of each boot/install server to have a different behavior for each 
set of node clients to each server. 

9.3.2.7 firstboot.cust 

The firstboot.cust file is a shell script that will be executed at the end of the 
node installation and customization process after the node is rebooted. The 
use of this file is optional. It is a user provided customization file. This is the 
recommended place to add most of your customization. 

This file should be used for importing a volume group, defining a host name 
resolution method used on a node, or installing additional software. 

It is installed on the nodes in the same way as script.cust: It must be created 
in the /tftpboot directory of a boot/install server and is automatically 
propagated to all nodes by the node installation process. 

- Note - 

At the end of the execution of the firstboot.script, the host name resolution 
method (/etc/hosts, NIS, DNS) MUST be defined and able to resolve all IP 
addresses of the SP system: CWS, nodes, the Kerberos server, and the 
NTP server. If it is not, the reboot process will not complete correctly. 

If you do not define this method sooner, either by including configured file 
in the mksysb image or by performing the customization in the script.cust 
file, you must perform this task in the firstboot.cust file. 


9.3.3 /usr/sys/inst.images 

This directory is the standard location for storing an installable Ipp image on 
an AIX system when you want to install the Ipp from disk rather than from the 
distribution media (tape, CD). You can, for example, use it if you want to 
install on the CWS another product than AIX and PSSP 

This directory is not used by the PSSP installation scripts. 

9.3.4 /spdata/sy si/in stall/images 

The /spdata/sysl/install/images directory is the repository for all AIX 
installable images (mksysb) that will be restored on the SP nodes using the 
PSSP installation scripts and the NIM boot/install servers configured during 
the CWS installation. 
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This directory must exist on the CWS, and its name must be kept unchanged. 
The SP nodes installation process will not work if the AIX mksysb images are 
stored in another directory of the CWS. 

If you want to use the default image provided with PSSP (spimg), you must 
store it in the /spdata/sysl/install/images directory. 

If all nodes have an identical software configuration (same level of AIX and 
LPPs), they can share the same mksysb image independently from their 
hardware configuration. 

If your SP system has several boot/install servers, the installation script will 
automatically create the /spdata/sysl/install/images directory on the 
boot/install servers and load it with the mksysb images needed by the nodes 
that will boot from each of these servers. 

9.3.5 /spdata/sys1/install/<aix_level>/lppsource 

For each level of AIX that will be running on a node in the SP system, there 
must exist on the CWS an /spdata/sys1/install/<aix_level>/lppsource 
directory. The recommended rule is to set the relative pathname <aix_level> 
to a name significantly indicating the level of AIX: aix414, aix421, aix432. 
However, this is not required, and you may choose whatever name you wish. 

This directory must contain the AIX Ipp images corresponding to the AIX 
level. In addition, this directory must contain the perfagent code 
corresponding to the AIX level. Refer to the 8.6.1, “PSSP prerequisites” on 
page 258 for the minimal sets of AIX and perfagent Ipp to install in this 
directory. Starting with AIX release 4.3.2, perfagent.tools is part of AIX and 
not Performance Aide for AIX (PAIDE), as it used to be in previous AIX 
releases. 

If the SP system contains several boot/install server, this directory will only 
exist on the CWS. It will be known as a NIM resource by all servers but will be 
defined as hosted by the CWS. When a node needs to use this directory, it 
mounts it directly from the CWS whatever NIM master it is pointing at. 

9.3.6 /spdata/sysl/install/psspIpp/PSSP-x.x 

For each level of PSSP that will be used by either the CWS or a node in the 
SP system, there must exist on the CWS a 

/spdata/sysl/install/psspipp/PSSP-x.x directory where PSSP-x.x is one of 
PSSP-2.2, PSSP-2.3, PSSP-2.4, or PSSP-3.1. 
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During the first step of the PSSP software installation on the CWS (refer to 
8.6.2, “PSSP filesets” on page 259), the PSSP source images must be 
installed using bffcreate into these directories. 

If the SP system contains more than one boot/install server, the installation 
scripts will create the /spdata/sysl/install/psspipp/PSSP-x.x directories on 
each server and load them with the PSSP Ipp filesets. 

9.3.7 /spdata/sysl/install/pssp 

You can create this directory manually on the CWS in the first steps of the 
PSSP installation. The CWS installation script will then store in this directory 
several files that will be used later during the nodes installation through the 
network. 

/spdata/sysl/install/pssp is also automatically created on the additional 
boot/install servers and populated with the following files: 

• pssp_script 

pssp_script is executed on each node by NIM after the installation of the 
mksysb on the node and before NIM reboots the node. It is run under a 
single user environment with the RAM file system in place. It installs 
required LPPs (such as PSSP) on the node and does post-PSSP 
installation setup. Additional adapter configuration is performed after the 
node reboot by psspfb_script. 

You should not modify this script. User customization of the node should 
be performed by other scripts: tuning.cust, script.cust, or firstboot.cust 
(refer to 9.3.2, “/tftpboot” on page 286). 

• bosinst_data 

The bosinst_data, bosinst_data_prompt, and bosinst_data_noprompt are 
NIM control files created by the installation of PSSP on the CWS. They are 
used during NIM installation of each SP node. They contain configuration 
information, such as the device that will be used as the console during 
node installation, locale information, and the name of the disk where to 
install the system. For further information, please refer to the AIX V4.3 
Network Installation Management Guide and Reference , SC23-4113. 

9.3.8 image.data 

In a mksysb system image, the image.data file is used to describe the rootvg 
volume group. In particular, it contains the size of the physical partition 
(PPSIZE) of the disk from which the mksysb was created. You usually do not 
need to modify this file. However, if the mksysb is to be restored on a node 
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where the PPSIZE is different from the PPSIZE defined in the image.data file, 
you may need to manually create a NIM imagedata resource and allocate it to 
the node that needs to be installed. 


9.4 Related documentation 

For complete reference and ordering information for the documents listed in 
this section, see Appendix D, “Related publications” on page 501. 

SP Manuals 

The reader can refer to two sets of documents related to either version 2.4 or 
version 3.1 of PSSP. 

PSSP: Administration Guide, GC23-3897, for PSSP 2.4 and PSSP: 
Administration Guide, SA22-7348, for PSSP 3.1. Chapters 7, 8, 12, 14, and 
15 provide detailed information about the services that may be configured in 
the SP system: Time Server, Automounter, Security, Switch and System 
partitions. 

PSSP: Installation and Migration Guide, GC23-3898, for PSSP 2.4. In 
Chapter 2, steps 22 to 59 detail the complete installation process. Appendix 
C, D, F, and G describe the customization of nodes, the wrappers associated 
with the setup_server command, and the procedure to solve port contention 
issues. 

PSSP: Command and Technical Reference, GC23-3900, for PSSP 2.4 and 
PSSP: Command and Technical Reference, SA22-7351 for PSSP 3.1 contain 
a complete description of each command listed in 9.2, “Installation steps and 
associated key commands” on page 267 

SP Redbooks 

RS/6000 SP: PSSP 2.2 Survival Guide, SG24-4928. Chapter 2 contains 
practical tips and hints about specific aspects of the installation process. 

Inside the RS/6000 SP, SG24-5145. Chapter 4 presents the concepts 
underlying the SP system software and provides help in the planning the 
installation of this software. Section 5.2 describes the high-level design of the 
installation process. 

Others 

We recommend the use of either AIX 4.2 Network Installation Management 
Guide and Reference, SC23-1926 or AIX V4.3 Network Installation 
Management Guide and Reference, SC23-4113 for getting any detailed 
information about NIM. 
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9.5 Sample questions 

This section provides a series of questions to help aid you in preparation for 
the certification exam. The answers to these questions can be found in 
Appendix A. 

1. In an SP system, which are true statements regarding a node's initial 
hostname and reliable hostname as defined in the SDR? (Note: Two 
options are correct.) 

A. The initial hostname is the standard TCP/IP hostname associated with 
one of the available TCP/IP interfaces on the node. 

B. The initial hostname refers to an SP node or CWS's hostname prior to 
PSSP installation on the node or CWS. 

C. The reliable hostname is the TCP/IP interface name associated with 
the node's enO interface. 

D. The reliable hostname is the TCP/IP interface name associated with 
the interface on an SP node that responds the most quickly to 
heartbeat packets over a period of time. 

2. Once the frames have been configured, and before starting configuring 
nodes, it is recommended to check that the frame supervisor microcode is 
at the latest level supported by the PSSP level being installed. The 
command that checks the supervisor microcode level is: 

A. spchksvr - G -r status all 

B. spsvrmgr -G -u all 

C. spsvrmgr -G -r status all 

D. It is done through SP Perspectives. 

3. How you configure node 1 to be a boot/install server? 

A. Run the setup_server script on node 1. 

B. Install NIM, and then run setup_server on node 1. 

C. Change the boot/install server field in the SDR for some nodes and 
then run setup_server. 

D. Change the boot/install server field in the SDR for some nodes, and 
then run the spbootins command to set those node to install. 

4. How does the nodecond script access the node to start the network booting 
process? 

A. Through the RS-232 line from the control workstation. 

B. Through TCP/IP from the control workstation. 
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C. Through the RS-232 line from the boot/install server node. 

D. Through the Ethernet network from the control workstation. 

5. Which command is used to configure additional adapters for nodes? 

A. sphrdwrad 

B. spethernt 

C. spadapters 

D. spadaptrs 

6. Which of the following daemons does the syspar_crti -a command NOT 
start? 

A. hats 

B. spconfigd 

C. kerberos 

D. pman 

7. Which of the following statements is true about the /tf tpboot directory? 

A. The directory only exists on the boot/install server and on the SP client 
nodes. 

B. You cannot add customization scripts to the directory. 

C. The customization of the boot/install server creates several files in the 
/tftpboot directory. 

D. On the client nodes, the directory is used as a permanent storage area. 

8. Which of the following statements is true about slterm? 

A. By default, the slterm provides a read-write connection. 

B. A useful command to take control of a node when the IP connection 
through the Ethernet network is not available. 

C. The slterm command opens a connection to the SP node switch port. 

D. The slterm command executes on the CWS and on the nodes. 

9. Which of the following commands displays configuration information 
stored in the SDR about the frames and the nodes? 

A. splstconfig 

B. splstdata 

C. spframeconfig 

D. spnodeconfig 
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10.Which of the following commands configures the machine where it is 
executed as a boot/install server? 

A. setup_cws 

B. setup_nodes 

C. server_setup 

D. setup_server 


9.6 Exercises 

Here are some exercises you may wish to perform: 

1. On a test system that does not affect any users, check if the Supervisor 
Microcode is up to date or needs to be upgraded. Which command will 
upgrade the Supervisor Microcode? What does the -u flag do to the target 
node? 

2. Which command defines the Nodes Ethernet Information for the study 
guide test environment on page 3? Describe all the necessary steps. 

3. What is the role of the /etc/bootptap.info file? 

4. What is the role of the /tftpboot directory? Which customization scripts can 
be manually added to the /tftpboot directory? 

5. Familiarize yourself with the following key files: /etc/bootptab.info, 
tuning.cust, script.cust, and firstboot.cust. 
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Chapter 10. Verification commands and methods 


This chapter presents some of the commands and methods available to the 
SP administrator to check that the SP system has been correctly configured, 
initialized, and started. 


10.1 Key concepts you should study 

Before taking the RS/6000 SP certification exam, you should understand the 
following concepts related to verifying and checking an SP system: 

• The spistdata command. 

• The various components of an SP system and the different verifications 
methods that apply to each of them. 

• PSSP daemons, System partition sensitive daemons, as well as Switch 
daemons that must be running and how to check if they are alive. 


10.2 Introduction to SP system checking 

Several options are available to the SP user or administrator who wish to 
verify that the system has been successfully installed and is running 
correctly: 

• Commands and SMIT menus 

• Graphical interfaces 

• Logs 

Section 10.3, “Key commands” on page 297 presents the commands that are 
available for checking various aspects of an SP system. Section 10.4, 
“Graphical user interface” on page 306 give a few hints about the use of spmon 
-g and Perspectives. Section 10.5, “Key daemons” on page 308 focuses on 
the daemons that are important to monitor an SP system. Section 10.6, 
“SP-specific logs” on page 310 lists the logs that are available to the user to 
check the execution of commands and daemons. 


10.3 Key commands 

PSSP comes with several commands for checking the system. But, some AIX 
commands are also useful to the SP user. We present in this section the most 
widely used AIX and PSSP commands for this purpose. 
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10.3.1 Verify installation of software 

During the CWS and SP system installation, your first verification task 
consists in checking that the AIX and PSSP software have been successfully 
installed, and that the basic components are configured correctly before you 
start entering configuration data specific to your environment. 

10.3.1.1 Checking software levels: Islpp 

The islpp command is the standard AIX command to check that an Ipp has 
been installed and to verify its level. You should use it on the CWS after 
installation of AIX and after you have installed (instaiip) the PSSP software 
from the /spdata/sysl/install/psspIpp/PSSP-x.x directory. At this point, you 
should check the consistency between the level of AIX, perfagent, and PSSP, 
using the tables of Section 8.6.1, “PSSP prerequisites” on page 258: 

Islpp -La bos* devices* perf* Xll* xlC* ssp* rsct* | more 

You should also verify that you have installed all PSSP filesets corresponding 
to your SP hardware configuration and to the options you wish to use (VSD, 
RVSD, and so on). 

10.3.1.2 Checking the SDR initialization: SDR_test 

Immediately after initialization of the SDR (instaii_cw), you should test that 
the SDR is functioning properly using the SDR_test command. This command 
can also be used later, during operation of the SP system, if you suspect 
problems with the SDR. 

10.3.1.3 Checking the System Monitor installation: spmon_itest 

The instaii_cw command also installs the System Monitor (spmon) on the 
CWS. At the same time that you test the SDR initialization, you can also test 
that spmon is correctly installed by using the spmon_itest command. 

10.3.1.4 Checking the System Monitor configuration: spmon_ctest 

After the SP hardware has been discovered by the CWS (spframe), you can 
check that the System Monitor has been correctly configured with the 
information about the SP frames and nodes hardware by using the 
spmon_ctest command. This command also checks if the hardmon daemon is 
running, the serial RS232 links to the frames and nodes are properly 
connected, the CWS can access the frames and nodes hardware through 
these connections, and the hardware information has been stored in the 
SDR. 
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10.3.1.5 Checking Ipp installation on all nodes: Ippdiff 

After complete installation of an SP system, or any time during the life of the 
SP system, you may need to check the level of software installed on all, or a 
subset, of nodes. The ippdiff command is an easier alternative to the use of 
dsh is ipp since it sorts and formats the output by filesets. It can be used to 
list any filesets and is not limited to PSSP. 

For example, to check all PSSP related filesets, you can use: 

Ippdiff -Ga ssp* rsct* 


10.3.1.6 Checking PSSP level: splst_versions 

If you only need to look for the PSSP versions installed on the nodes, and not 
for all the detailed information returned by ippdiff, you can use the 
spist_versions command. For example, in our environment, we can get this 
information for each node as shown in Figure 110. 


[sp3en0:/usr/lpp/ssp]# splst_versions -tG 

I PSSP-3.1 

5 PSSP-3.1 

6 PSSP-3.1 

7 PSSP-3.1 

8 PSSP-3.1 

9 PSSP-3.1 

10 PSSP-3.1 

II PSSP-3.1 

12 PSSP-3.1 

13 PSSP-3.1 

14 PSSP-3.1 

15 PSSP-3.1 

[sp3en0:/usr/lpp/ssp]# 


Figure 110. PSSP versions installed on each node 

10.3.1.7 Checking Sysman components: SYSMAN_test 

The SYSMAN_test command is a very powerful test tool. It checks a large 
number of SP system management components. We present it in this section 
since it is recommended to execute this command after installation of the 
CWS and before the installation of the node. However, its use is not limited to 
the installation phase of the life of your SP system. It can provide valuable 
information during normal operation of an SP system. 

The SYSMRN_test command is executed on the CWS, but it does not restrict its 
checking to components of the CWS. If nodes are up and running, it will also 
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perform several tests on them. Subsets of the components checked by 
SYSMAN_test are: ntp, automounter, file collection, user management, nfs 
daemons, /.klogin file, and so on. 

The output of SYSMAN_test, using the -v (verbose) option, is generally large. 
We, therefore, recommend to redirect the output to a file to prevent flooding 
the screen with messages that display too fast and then use a file browser or 
editor to look at the results of the command. An alternative is to look at file 
/var/adm/SPIogs/SYSMAN_test.log, but this file does not contain all the 
information provided by the verbose option. 

10.3.1.8 ssp.css: Switch code CSS_test 

The last command we will present to check system installation is css_test. 
There is no point to use it on a switchless system. 

The css_test command can be used to check that the ssp.css Ipp has been 
correctly installed. In particular, css_test checks for inconsistencies between 
the software levels of ssp.basic and ssp.css. This is why we present this 
command in this section. However, it is also useful to run this command on a 
completely installed and running system where the switch has been started 
since it will also check that communication can be performed over the switch 
between the SP nodes. 

10.3.2 Verify system partitions 

Two commands are particularly useful for checking the SP system partitions. 

10.3.2.1 Listing existing partition: splst_syspars 

The first of these commands, spist_syspar, only lists the existing partitions in 
the SP system. Using its only option, -n, you can obtain either the symbolic or 
the numeric value of the partition: 

[sp3en0:/usr/lpp/ssp]# splst_syspars -n 
sp3en0 

[sp3en0:/usr/lpp/ssp]# splst_syspars 

192.168.3.130 

[sp3en0:/usr/lpp/ssp]# 

10.3.2.2 Verifying system partitions: spverify_config 

The spverify_config command is used to check the consistency of the 
information stored in the SDR regarding the partitions defined in the SP 
system. It is only to be used when the system has more partitions than the 
initial default partition. 
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10.3.3 Checking subsystems 

These are some useful commands for checking the different PSSP 
subsystems. 

10.3.3.1 Checking subsystems: Issrc 

The issrc command is not part of PSSP. It is a standard AIX command, part 
of the System Resource Controller feature of AIX. It is used to get the status 
of a subsystem, a group of subsystems, or a subserver. 

In an SP environment, it is especially used to obtain information about the 
status of the system partition-sensitive subsystems. To check if these 
subsystem are running on the CWS, you can use the issrc command with the 
-a option to get the status of all AIX subsystem, and then filter (grep) the 
result on the partition name. In our environment, the result is listed in Figure 
111 . 


[sp3en0:/]# Issrc -a | 

grep sp3en0 


sdr.sp3en0 

sdr 

9032 

active 

hats.sp3en0 

hats 

15144 

active 

hags.sp3en0 

hags 

21984 

active 

hagsglsm.sp3 enO 

hags 

104768 

active 

haem.sp3en0 

haem 

17620 

active 

haemaixos.sp3en0 

haem 

105706 

active 

hr.sp3en0 

hr 

37864 

active 

pman.sp3en0 

pman 

102198 

active 

pmanrm.sp3en0 

pman 

25078 

active 

Emonitor.sp3en0 

emon 


inoperative 

[sp3en0: /] # 





Figure 111. Listing status of system partition-sensitive subsystems on the CWS 


You can also use issrc command on SP nodes or to get detailed information 
about a particular subsystem. Figure 112 on page 302 shows a long listing of 
the status of the Topology Services subsystem on one of the SP nodes. 
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[sp3n06.msc.itso.ibm.com:/]# lssrc -1 -s hats 
Subsystem Group PID Status 


hats 

hats 


7438 active 


Network Name 

Indx Defd Mbrs 

St Adapter ID 

Group ID 

SPether 

[ 0] 

13 13 

S 192.168.31.16 

192.168.31.115 

SPether 

[ 0] 


0x4666fc36 

0x46744d3b 

HB Interval = 

1 secs. 

Sensitivity = 4 missed beats 

SPswitch 

[ 1] 

12 12 

S 192.168.13.6 

192.168.13.15 

SPswitch 

[ 1] 


0x4667df4c 

0x46682bc7 

HB Interval = 

1 secs. 

Sensitivity = 4 missed beats 

2 locally connected 

Clients 

with PIDs: 


haemd( 9292) 

hagsd( 

8222) 




Configuration Instance = 912694214 

Default: HB Interval = 1 secs. Sensitivity = 4 missed beats 


CWS = 192.168.3.130 
[sp3n06.msc.itso.ibm.com:/]# 


Figure 112. Listing topology services information on node sp3n06 

10.3.3.2 syspar_ctrl -E 

The syspar_ctri command is the PSSP command providing control of the 
system partition-sensitive subsystems. In 9.2.11, “Start system 
partition-sensitive subsystems” on page 276, we have seen that the -a option 
of this command adds and starts the subsystems. 

The syspar_ctri -e command displays (examine) all supported subsystems 
and reports on the lists of subsystems it can manage. 

You can then use the other options of syspar_ctri to stop, refresh, start, or 
delete subsystems that were reported as manageable by syspar_ctri -e. 

10.3.4 Monitoring hardware status 

This monitoring is done through the RS-232 line that connects the control 
workstation and each frame. From the control workstation, the hardmon 
daemon uses a designated tty to connect to each frame supervisor card. 

10.3.4.1 Checking hardware connectivity: spmon_ctest 

The spmon_ctest command runs on the CWS and performs many checks. We 
present it here since it tests hardware connectivity (serial links) between the 
CWS and the SP nodes. However, it also checks that the hardmon and sdr 
daemons are running, that it can communicate with the frame, and that the 
System Monitor has been correctly configured. 
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We recommend to use this command each time a new frame or node has 
been added to an SP system, after using the spframe command, and to check 
that the new nodes have been correctly discovered by PSSP and that they 
have been taken into account in the SDR. 

10.3.4.2 Monitoring hardware activity: spmon -d 

The spmon command is a monitoring and control command. It can retrieve and 
display information about the hardware component of the SP system as well 
as act on them. We present only a few options here. 

The spmon -d -G command displays a summary of the hardware status of all 
components: Frames, nodes, and switches. It checks that the hardmon 
daemon is running and then reports on the power status, key setting, LEDs, 
hostresponds and switchresponds, and so on. Figure 113 shows the result of 
this command on our CWS. 


[sp3en0:/]# spmon -d -G 

1. Checking server process 

Process 16262 has accumulated 42 minutes and 14 seconds. 
Check ok 

2. Opening connection to server 
Connection opened 

Check ok 

3. Querying frame(s) 

1 frame(s) 

Check ok 

4. Checking frames 

Controller Slot 17 Switch Switch Power supplies 

Frame Responds Switch Power Clocking A B C D 


1 yes yes on 0 on on on on 

5. Checking nodes 


Frame 1 


Frame 

Slot 

Node 

Number 

Node 

Type 

Power 

Host/Switch Key 
Responds Switch 

Env 

Fail 

Front Panel LCD/LED is 

LCD/LED Flashing 

1 

1 

high 

on 

yes 

yes 

normal 

no 

LCDs 

are 

blank 

no 

5 

5 

thin 

on 

yes 

yes 

normal 

no 

LEDs 

are 

blank 

no 

6 

6 

thin 

on 

yes 

yes 

normal 

no 

LEDs 

are 

blank 

no 

7 

7 

thin 

on 

yes 

yes 

normal 

no 

LEDs 

are 

blank 

no 

8 

8 

thin 

on 

yes 

yes 

normal 

no 

LEDs 

are 

blank 

no 

9 

9 

thin 

on 

yes 

yes 

normal 

no 

LEDs 

are 

blank 

no 

10 

10 

thin 

on 

yes 

yes 

normal 

no 

LEDs 

are 

blank 

no 

11 

11 

thin 

on 

yes 

yes 

normal 

no 

LEDs 

are 

blank 

no 

12 

12 

thin 

on 

yes 

yes 

normal 

no 

LEDs 

are 

blank 

no 

13 

13 

thin 

on 

yes 

yes 

normal 

no 

LEDs 

are 

blank 

no 

14 

14 

thin 

on 

yes 

yes 

normal 

no 

LEDs 

are 

blank 

no 

15 

15 

wide 

on 

yes 

yes 

normal 

no 

LEDs 

are 

blank 

no 

[sp3en0: /] # 












Figure 113. spmon -d -G 
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You can also query specific hardware information using the query option of 
spmon. For example, you can get the Power LED status of node 17: 

>spmon -q nodel7/powerLED/value 

1 

This option is generally used when writing script. For interactive use, it is 
easier to use the Graphical tools provided by PSSP (see 10.4, “Graphical 
user interface” on page 306). 

10.3.5 Monitoring node LEDs: spmon -L, spied 

If you only wish to remotely look at the LEDs on the front panel of nodes, 
there are alternatives to the spmon -d command: 

• spmon -l <node> retrieves for one node the current value of the LED 
display. 

• spied opens a graphical window on yourX terminal and starts monitoring 
and displaying in this window the values of the LEDs for all nodes. The 
windows stays open until you terminate the spied process. 

10.3.6 Extracting SDR contents 

The SDR is the main repository for holding information about an SP system. It 
is, therefore, important that you know how to manage the information it 
contains. Many commands are available for this purpose. We only present in 
this section two of these commands. We strongly encourage you to refer to 
the PSSP: Command and Technical Reference , SA22-7351 and to read 
about these two commands as well as about all commands whose names 
start with SDR. 

10.3.6.1 SDRGetObjects 

The SDRGetObjects command extracts information about all objects in a class. 
For example, you can list the reliable hostname of all SP nodes: 

[sp3en0:/]# SDRGetObjects Node reliable_hostname 

r e1iab1e_hos tname 

sp3n01.msc.itso.ibm.com 

sp3n05.msc.itso.ibm.com 

sp3n06.msc.itso.ibm.com 

sp3n07.msc.itso.ibm.com 

sp3n08.msc.itso.ibm.com 

sp3n09.msc.itso.ibm.com 

sp3nl0.msc.itso.ibm.com 

sp3nll.msc.itso.ibm.com 

sp3nl2.msc.itso.ibm.com 

sp3nl3.msc.itso.ibm.com 

sp3nl4.msc.itso.ibm.com 

sp3nl5.msc.itso.ibm.com 

[sp3en0:/]# 
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The output of SDRGetobjects can be long when you display information about 
all objects that are defined in a class. You can, therefore, use the == option of 
this command to filter the output: The command will only display a result for 
objects that satisfy the predicate specified with ==. For example, to display the 
node number and name of the Ippsource directory used by only the 
multiprocessor nodes in our environment: 

[sp3en0:/]# SDRGetObjects Node processor_type==MP node_number lppsource_name 
node_number lppsource_name 
1 aix432 


10.3.6.2 splstdata 

The SDRGetObjects command is very powerful and is often used in SP 
management script files. However, its syntax is not very suitable for everyday 
interactive use by the SP administrator since it requires that you remember 
the exact spelling of classes and attributes. PSSP provides a front end to 
SDRGetObjects for the most often used queries: splstdata. This command 
offers many options. We have already presented options -a, -b, -f, and -n in 
Section 9.2.4, “Check the previous installation steps” on page 271. You must 
also know how to use: 


splstdata -v 
splstdata -s 
splstdata -h 
splstdata -i 
splstdata -e 


To display volume group information (PSSP 3.1 only) 

To access switch information 

To extract hardware configuration information 

To display node IP configuration 

To display site environment information 


10.3.7 Checking IP connectivity: ping/telnet/rlogin 

The availability of IP communication between the CWS and the SP nodes is 
critical for the successful operation of the SP system. However, PSSP does 
not provide any tool to check the TCP/IP network since there is nothing 
specific to the SP in this area. Common TCP/IP commands can be used in 
the SP environment: ping, telnet, rlogin, traceroute, netstat, arp, and SO 
on. These commands will return information for all IP connections, including 
the SP Ethernet service network and the Switch network if it has been 
configured to provide IP services. For example, running the arp -a command 
on node 6: 


[sp3n06.msc.itso.ibm.com:/]# arp -a 
? (192.168.13.4) at 0:3:0:0:0:0 
sp3sw05.msc.itso.ibm.com (192.168.13.5) at 
sp3sw07.msc.itso.ibm.com (192.168.13.7) at 
sp3n01enl.msc.itso.ibm.com (192.168.31.11) 
sp3n05.msc.itso.ibm.com (192.168.31.15) at 
sp3n07.msc.itso.ibm.com (192.168.31.17) at 
[sp3n06.msc.itso.ibm.com:/] # 


0 : 4 : 0 : 0 : 0 : 0 
0 : 6 : 0 : 0 : 0 : 0 

at 2:60:8c:e8:d2:el [ethernet] 
10:0:5a:fa:13:af [ethernet] 

10:0:5a:fa:13:dl [ethernet] 
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shows that IP communications have already been established between node 
6 and node 7 through the Ethernet network as well as through the switch. 

10.3.8 SMIT access to verification commands 

Many of the commands listed previously can be accessed through SMIT. 

Each option of the spistdata can be called from an entry in the SMIT List 
Database Information window (smitty iist_data) or one of its subwindows. 

Figure 114 present the SMIT RS/6000 SP Installation/Configuration 
Verification window (smitty sp_verify). The first six entries in this window 
respectively correspond to spmon_itest, spmon_ctest, SDR_test, SYSMAN_test, 
css_test, and spverify_config. The last three corresponds to commands we 
did not mention in this section are: st_verify, jm_instaii_verify, and 
jm_verify. 


RS/6000 SP Installation/Configuration Verification 
Move cursor to desired item and press Enter. 

System Monitor Installation 

System Monitor Configuration 

System Data Repository 

System Management 

Communication Subsystem 

System Partition Configuration 

Job Switch Resource Table Services Installation 

Resource Manager Installation 

Resource Manager Configuration 


Fl=Help F2=Refresh F3=Cancel F8=Image 

F9=Shell F10=Exit Enter=Do 


Figure 114. SMIT verification window 


10.4 Graphical user interface 

PSSP provides an alternative to the use of the command line interface or the 
SMIT panels for monitoring a system. You can use two graphical interfaces for 
that purpose. These interfaces are started by the commands: spmon -g and 
perspectives. 

The spmon -g command is available in all versions of PSSP up to release 2.4. 
Although Perspectives has been available since PSSP 2.2, all graphical tools 
used for management of an SP system are now accessible through the 
Perspectives Launch Pad and have been greatly enhanced in PSSP 3.1. In 
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PSSP 3.1, the spmon -g functionalities have been replaced with the SP 
Hardware Perspectives tool. 

It is impossible in a book such as this study guide to provide a complete 
description of all the features of the new PSSP Perspectives User Interface. 
All monitoring and control functions needed to manage an SP system can be 
accessed through this interface. We, therefore, recommend that you refer to 
SP Perspectives: A New View of Your SP System, SG24-5180, for further 
information about this tool. Another good source of information is the 
Perspectives online help available from the Perspectives Launch Pad. 

The Perspective initial panel, Launch Pad, is customizable. You can add 
icons to this panel for the actions you use often. By default, the Launch Pad 
contains shortcuts to some of the verification commands we have presented 
in previous sections: 

• Monitoring Of hostsResponds, switchResponds, nodePowerLEDs 

• SMIT SP_verify 
•syspar_ctrl -E 



SP Perspectives Launch Pad 

-sp5en0 

l° !□! 

Window Actions 

Options 



Help 

i« ! 


h. 



[Hardware ! 

Event 

System 

Hardware: Monitor 

Hardware: Monitor 

[Perspective; 

Perspective 

Partitioning 

Aid 

nodes for three nodes and view node 

important conditions attributes in table 


£ 

a 

# 

# 

1 Hardware: 2 windows Hardware: Manage 

Perspectives 

smit 

smit 

-Nodes in table 
-Nodes in frames 

a system with 
many nodes 

online help 

config data 
on CWS 

cluster_mgmt 

# 

# 



# 

smit 

smit 

smit 

smit 

smit 

SP verify 
on CWS 

# 

syspar_ctrl -A 

splogmgt 

$ 

syspar.ctrl -E 

spusers 
on CWS 

supervisor 
on CWS 

devices 


Figure 115. Perspectives Launch Pad 

If you decide to perform most of your SP monitoring through the Perspectives 
tools, we recommend that you add your favorite tools to the Launch Pad. 
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10.5 Key daemons 

The management of an SP system relies heavily on the availability of several 
daemons. It is important that you understand the role of these daemons. 
Furthermore, you should know, for the most important daemons, how they 
are started, how to check that they are running, and how they interact. 

The SP related daemons are listed in Table 25. 

Table 25. SP Daemons 


Hardware monitoring 

hardmon, S70d 

SDR 

sdrd 

Switch fault handling 

fault_service_Worm_RTG_SP, also 
known as the Worm 

Switch management 

cssadm, css.summlog 

System partition-sensitive daemons 

haemd, hagsd, hagsglsmd, hatsd, hrd 

Kerberos daemons 

kadmind, kerberos, kpropd 

Event and Problem management 

pmand, pmanrmd 

SP SNMP trap generator 

sp_configd 

Hardware events logging 

splogd 

SNMP manager 

spmgrd 

File collection 

supfilesrv 

Job Switch Resource Table Services 

Job Switch Resource Table Services 

Sysctl 

sysctld 

Network Time Protocol 

xntpd 


We provide in the following section a very brief description of some of these 
daemons. 


10.5.1 Sdrd 

The sdrd daemon runs on the CWS. It serves all request from any client 
application to manipulate SDR information. It is managed using the AIX SRC 
commands. There is an entry for sdrd in the /etc/inittab, and sdrd is started at 
CWS boot time. This daemon must be running before any SP management 
action can be performed. 
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You can use any of the following commands to check that the sdrd is running: 

ps -ekf | grep sdrd 
lssrc -g sdr 
SDR_test 
splstdata -e 


10.5.2 Hardmon 

The hardmon daemon runs on the CWS. It manages the serial port of the 
CWS that are connected to the SP frame. It controls all frames and node 
hardware through an SP specific protocol for communicating over the serial 
links. It also manages the S70d, which performs the hardware monitoring of 
non-SP frames over serial links. There is an entry for hardmon in the 
/etc/inittab, and it is started at CWS boot time. 

No management of the SP hardware can be performed until the hardmon 
daemon is running. It is, therefore, important that you verify that this daemon 
is always running on the CWS. You can check hardmon with one of the 
following commands: 

ps -ekf | grep hardmon 
lssrc -s hardmon 
spmon_ctest 


10.5.3 Worm 

The worm runs on all SP nodes in an SP system equipped with a switch. The 
worm is started by the rc.switch script, which is started at node boot time. 
The worm must be running on the primary node before you can start the 
switch with the Estart command. We recommend that you refer to Chapter 14 
of the PSSP: Administration Guide, GC23-3897, for PSSP 2.4 and PSSP: 
Administration Guide, SA22-7348, for PSSP 3.1. for more details about the 
Switch daemons. 

10.5.4 Topology Services, Group Services, and Event Management 

The Topology Services, Group Services, and Event Management subsystems 
are managed by the PSSP syspar_ctri command (refer to 10.3.2, “Verify 
system partitions” on page 300). 

These subsystems are closely related. The Topology Services provides 
information about the SP systems to the Group Services, and Event 
Management subsystems rely on information provided by the Topology 
Services subsystem to offer their own services to other client applications. 
VSD, RVSD, and GPFS are examples of clients’ applications of the Topology 
Services, Group Services, and Event Management subsystems. 
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We recommend that you refer to Chapter 22, 23, and 24 of the PSSP: 
Administration Guide, GC23-3897 for PSSP 2.4 and PSSP: Administration 
Guide, SA22-7348, for PSSP 3.1 for more details about the Topology 
Services, Group Services, and Event Management. 


10.6 SP-specific logs 

Since SP systems are complex, the amount of data that an SP administrator 
may need to look at to manage such systems is far beyond what can be 
reasonably be gathered in one file or displayed in one screen. 

The various components of PSSP, therefore, store information about their 
processing in several different logs. PSSP generates information in about 30 
log files. A complete list of all these logs can be found on page 77, Chapter 4 
"Error Logging Overview", of the PSSP Diagnosis Guide, GA22-7350. 

Most of the SP related logs can be found in /var/adm/SPIogs on the CWS and 
on the SP nodes. A few other logs are stored in /var/adm/ras and 
/var/tmp/SPIogs. 

You generally only look at logs for problem determination. For the purpose of 
this chapter (verifying the PSSP installation and operation), we will only 
mention the /var/adm/SPIogs/sysman directory. On each SP node, this 
directory contains the trace of the AIX and PSSP installation, their 
configuration, and the execution of the customization scripts described in 
Section 9.3.2, “/tftpboot” on page 286. We recommend that you look at this 
log after the installation of a node to check that it has successfully completed. 
The installation of a node involves the execution of several processes that 
are not linked to a terminal (scripts defined in /etc/inittab, for example). You 
may not notice that some of these scripts have failed if you do not search for 
indication of their completion in the /var/adm/SPIogs/sysman directory. 


10.7 Related documentation 

For complete reference and ordering information for the documents listed in 
this section, see Appendix D, “Related publications” on page 501. 

SP Manuals 

The reader can refer to the related document for Version 2.4 of PSSP: 

PSSP: Installation and Migration Guide, GC23-3898 for PSSP 2.4. The 
installation of an SP system is a long process involving several steps (up to 
50 or more depending on the complexity of the system). Therefore, several 
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verifications can be performed during installation to ensure that the already 
executed steps have been completed correctly. Chapter 2 of this guide 
documents the use of these verifications methods during the SP installation. 

PSSP: Command and Technical Reference, GC23-3900, for PSSP 2.4 and 
PSSP: Command and Technical Reference, SA22-7351 for PSSP 3.1 contain 
a complete description of each command listed in 10.3, “Key commands” on 
page 297. 

Chapter 19 of PSSP: Diagnosis and Messages Guide, GC23-3899 for PSSP 
2.4 and Chapter 24 of PSSP Diagnosis Guide, GA22-7350, for PSSP 3.1 
describe, in detail, the verification of System Management installation using 
the SYSMAN_test command. 

The PSSP: Administration Guide, GC23-3897 for PSSP 2.4 and PSSP: 
Administration Guide, SA22-7348, for PSSP 3.1. Chapter 14 describes the 
Switch related daemons, while Chapters 22, 23, and 24 provide you with 
detailed information about the partition-sensitive subsystems and their 
daemons. 

SP Redbooks 

RS/6000 SP Monitoring: Keeping It Alive, SG24-4873. Chapter 5 provides 
you with a detailed description of the Perspectives graphical user interface. 

SP Perspectives: A New View of Your SP System, SG24-5180, is entirely 
dedicated to explaining the use of Perspectives but only addresses Version 
3.1 of PSSP. 


10.8 Sample questions 

This section provides a series of questions to help aid you in preparation for 
the certification exam. The answers to these questions can be found in 
Appendix A. 

1. PSSP provides several tools and scripts for checking components and 
verifying that they are working properly. Which command can be used to 
verify that the SDR has been properly set up and that it is working fine? 

A. test_SDR 

B. SDR_itest 

C. SDR_ctest 

D. SDR_test 

2. How do you obtain frame, switch, and node hardware information in PSSP 
3.1? 


Chapter 10. Verification commands and methods 311 



A. Run the command spmon -g. 

B. Run the command SDRGetObjects Hardware. 

C. Run the command spmon -d. 

D. Run the command spmon -g -d. 

3. The hardmon daemon runs on the control workstation only. Which of the 
following statements is false? 

A. It uses the RS-232 lines to contact the frame supervisor cards. 

B. It is a partition-sensitive daemon. 

C. It requires read/write access to each tty connected to frames. 

D. It logs information in /var/adm/SPIogs/hardmon directory. 

4. Which of the following Worm characteristics is FALSE? 

A. The worm runs on all SP nodes in an SP system equipped with a 
switch. 

B. The worm is started by the rc. switch script. 

C. The worm must be started manually. 

D. The worm must be running on the primary node before you can start 
the switch. 

5. Which command checks a large number of SP system management 
components? 

A. spmon_ctest 

B. SYSMAN_test 

C. test_SYSMAN 

D. css test 


10.9 Exercises 

Here are some exercises you may wish to perform: 

1. Familiarize yourself with the different verification and monitoring 
commands documented in the chapter. 

2. Use the various flags of the spistdata command to extract data from the 
SDR 

3. Familiarize yourself with some of the key daemons documented in section 
10.5. 
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Chapter 11. Understanding additional SP-related products 


In addition to PSSP, several products are used in RS/6000 SP environment to 
provide workload management, connectivity, higher availability, and so on. 
This chapter provides an overview of some of these products. 


11.1 Key concepts you should know 

Although most of these products are not essential for any SP installation, they 
are commonly found in customer environments. In preparation for the SP 
Certification exam, you should understand how the following products work 
and what solutions they provide: 

• LoadLeveler 

• Performance Toolbox Parallel Extension (PTPE) 

• High Availability Control Workstation (HACWS) 

• NetTAPE 

• Client Input Output Socket (CLIO/S) 


11.2 Understanding LoadLeveler 

LoadLeveler is a software program designed to automate workload 
management. In essence, it is a scheduler that also has facilities to build, 
submit, and manage jobs. The jobs can be processed by any one of a number 
of machines, which together are referred to as the LoadLeveler cluster. Any 
stand-alone RS/6000 may be part of a cluster, although LoadLeveler is most 
often run in the RS/6000 SP environment. A sample LoadLeveler cluster is 
shown in Figure 116. 
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Figure 116. Example LoadLeveler configuration 

Important concepts in LoadLeveler are: 

Cluster. A group of machines that are able to run LoadLeveler jobs. Each 
member of the cluster has the LoadLeveler software installed. 

Job. A unit of execution processed by LoadLeveler. A serial job runs on a 
single machine. A parallel job is run on several machines simultaneously and 
must be written using a parallel language Application Programming Interface 
(API). As LoadLeveler processes a job, the job moves into various job states, 
such as Pending, Running, and Completed. 

Job Command File. A formal description of a job written using LoadLeveler 
statements and variables. The command file is submitted to LoadLeveler for 
scheduling of the job. 

Job Step. A job command file specifies one or more executable programs to 
be run. The executable and the conditions under which it is run are defined in 
a single job step. The job step consists of several LoadLeveler command 
statements. 

By way of example, Figure 117 on page 317 schematically illustrates a series 
of job steps. In this figure, data is read from storage in job step one. 
Depending on the exit status of this operation, the job is either terminated or 
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continues on to job step two. Again, LoadLeveler examines the exit status of 
job step two and either proceeds on to job step three, which, in this example, 
prints the data that the user requires or terminates. 


Load-Leveler Job 



Figure 117. A LoadLeveler job 


11.2.1 A breakdown of how it works 

There are three important functional machine types in LoadLeveler. 

Scheduling machine. When a job is submitted to LoadLeveler, it gets placed 
in a queue that is managed by the scheduling machine. The latter then asks 
the central manager to find a machine that can process the job. 

Central manager machine. This machine evaluates the resources required 
by the job that were specified in the job command file and selects a machine 
that is capable of running it. The central manager is also called the negotiator. 

Executing machines. Machines that are assigned and run jobs. 

Figure 118 on page 318 shows how these machine types fit together and the 
order in which they communicate. 

1. A job has been submitted to LoadLeveler. 
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2. The scheduling machine contacts the central manager to inform it that 
a job has been submitted and to find out if there is a machine available 
that matches the job’s requirements. 

3. The central manager checks to determine if a machine exists that is 
capable of running the job. Once a machine is found, the central 
manager informs the scheduling machine which machine is available. 

4. The scheduling machine contacts the executing machine and sends it 
the job information and executable program. The executing machine 
sends job status information to the scheduling machine and notifies it 
when the job has completed. 



Figure 118. LoadLeveler job flow 

In addition, there is another type of machine known as a submit-only 
machine. As its name indicates, this type of machine can only submit jobs, 
although it is also able to query and cancel them. 

Jobs do not get dispatched to the executing machines on a first-come, first- 
served basis unless LoadLeveler is specifically configured to run that way, 
that is, with a first in first out (FIFO) queue. Instead, the negotiator calculates 
a priority value for each job called SYSPRIO that determines when the job will 
run. Jobs with a high SYSPRIO value will run before those with a low value. 

The system administrator can specify several different parameters that are 
used to calculate SYSPRIO. Examples of these are: How many other jobs the 
user already has running, when the job was submitted, and what priority the 
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user has assigned to it. The user assigns priorities to his own jobs by using 
the userjpriority keyword in the job command file. 

SYSPRIO is referred to as a job’s system priority, whereas, the priority that a 
user assigns his own jobs is called user priority. If two jobs have the same 
SYSPRIO calculated for them by LoadLeveler, then the job that runs first will 
be the job that has the higher user priority. 

The priority of a job in the LoadLeveler queue is completely separate and 
must be distinguished from the aix nice value, which is the priority of the 
process the executable program is given by AIX. 

LoadLeveler also supports the concept of job classes. These are defined by 
the system administrator and are used to classify particular types of jobs. For 
example, we define two classes of jobs that run in the clusters called night 
jobs and day jobs. We might specify that executing machine A, which is very 
busy during the day because it supports a lot of interactive users, should only 
run jobs in the night class. However, machine B, which has a low workload in 
the day, could run both. LoadLeveler can be configured to take job class into 
account when it calculates SYSPRIO for a job. 

As SYSPRIO is used for prioritizing jobs, LoadLeveler also has a way of 
prioritizing executing machines. It calculates a value called MACHPRIO for 
each machine in the cluster. The system administrator can specify several 
different parameters that are used to calculate MACHPRIO, such as load 
average, number of CPUs, the relative speed of the machine, free disk space, 
and the amount of memory. 

Machines may be classified by LoadLeveler into pools. Machines with similar 
resources, for example, a fast CPU might be grouped together in the same 
pool so that they could be allocated CPU-intensive jobs. A job can specify as 
one of its requirements that it will run on a particular pool of machines. In this 
way, the right machines can be allocated the right jobs. 


11.3 Understanding PTPE 

The performance Toolbox is a performance analysis tool for standalone 
RS/6000 machines. PTPE is a parallel extension to this tools that enables 
performance monitoring and analysis on large SP systems. In addition to the 
capabilities of PTX/6000, PTPE provides: 

• Collection of SP-specific data. PTPE provides ptpertm, an additional 
data supplier that complements the data xmservd collects. The 
SP-specific performance data is currently implemented for: 
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- SP Switch 

- LoadLeveler 

- VSD 

• SP runtime monitoring. The system administrator should have a global 
view of SP performance behavior. With reference to Figure 119, similar 
nodes of the first tier, or Collectors, can be grouped and their performance 
data summarized by their respective Data Manager node in the second 
tier. This way, large SP systems can be easily monitored from a single 
presentation application by viewing node groups instead of individual 
nodes. The Data Managers are administered by the Central Coordinator in 
the third tier. The Central Coordinator aggregates the Data Managers’ 
summary data to provide a total performance overview of the SP. Of 
course, the base PTX/6000 monitoring functions can be used to focus on 
any particular performance aspect of an individual node. 



Figure 119. PTPE monitoring hierarchy 

• Data Analysis and Data Relationship Analysis. PTPE provides an API 
to allow analysis applications to sift through all requested data. The data 
archive created by PTPE exists on every node and is completely 
accessible through the API. In base PTX/6000, performance data is 
analyzed with the azizo utility, which is restricted to simple derivatives, 
such as maximum, minimum, and average. With the PTPE API, programs 
of any statistical complexity can be written to find important trends or 
relationships. Also, data captured for azizo use is far more limited with 
base PTX/6000. 
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In PSSP 3.1 and later, PTPE is included in PSSP at no extra charge. For 
levels prior to PSSP 3.1, PTPE is a separately orderable and a priced feature 
of the PSSP LPP. 


11.4 Understanding HACWS 

HACWS is an optional collection of components that implement a backup 
CWS for an SP. The backup CWS takes over when the primary control 
workstation requires upgrade service or fails. The HACWS components are: 

• A second RS/6000 machine supported for CWS use. 

• The HACWS connectivity feature (#1245) ordered against each frame in 
the system. This furnishes a twin-tail for the RS-232 connection so that 
both the primary and backup CWSs can be physically connected to the 
frames. 

• HACMP for AIX installed on each CWS. HACWS is configured as a 
two-node rotating HACMP cluster. 

• The HACWS feature of PSSP. This software provides SP-specific cluster 
definitions and recovery scripts for CWS failover. This feature is 
separately orderable and priced and does not come standard with PSSP. 

• Twin-tailed external disk, physically attached to each CWS, to allow 
access to data in the /spdata file system. 

An HACWS cluster is depicted in Figure 120 on page 322. 
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If the primary CWS fails, the backup CWS can assume all CWS functions with 
the following exceptions: 

• Updating passwords (if SP User Management is in use) 

• Adding or changing SP users 

• Changing Kerberos keys (the backup CWS is typically configured as a 
secondary authentication server) 

• Adding nodes to the system 

• Changing site environment information 


11.5 Understanding NetTAPE 

NetTAPE lets you manage a group of tape devices from a single workstation 
or multiple workstations using either a Motif/X-Window System-based 
graphical user interface or a set of commands. 

NetTAPE can: 
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Consolidate control of distributed tape operations 

NetTAPE provides a single system image of all of the network's tape 
devices. Tape device allocation, mount queue management, and tape 
device monitoring functions are performed using a graphical user 
interface. 

Customize operator views of tape operations 

NetTAPE allows you to assign each tape device to an operator domain. 
Using the NetTAPE GUI, operators limit the display of tape devices to 
those in their own domain. They see only the devices for which they are 
responsible. 

Use tape device pools to process mount requests more efficiently 

NetTAPE lets you create pools of tape devices organized by device type. 
With device pools, mount requests for a certain type of device can be 
satisfied by any device in the pool. As a result, mount requests can be 
processed more quickly and efficiently. 

Support for advanced tape devices and features 

NetTAPE Tape Library Connection (NetTAPE TLC) supports advanced 
tape devices, such as the IBM 3494, 3495, and 3575 Tape Library 
Dataservers, and StorageTek Tape Libraries. It also supports the 
automatic cartridge loading functions of several types of tape devices. 
NetTAPE lets installations take advantage of the large capacity and 
automatic features of these tape devices in an AIX environment. 

With the use of ADSM device drivers, a myriad of SCSI-attached 
autochangers and libraries, from small 16 GB Autochangers to 14.4 TB 
libraries, are also supported. 

Coexistence with ADSM and CLIO/S 

NetTAPE works with ADSM for AIX Version 2.1 and can coexist on the 
same network with IBM's CLIO/S. The IBM 3494, 3495, and 3575 Tape 
Library Dataservers, StorageTek Tape Libraries, and SCSI-attached 
libraries and autochangers can be managed by NetTAPE and shared with 
ADSM, therefore, allowing you to make better use of tape resources. 

Starting in Version 1, Release 2, NetTAPE TLC supports remote devices 
and esoteric device pools for ADSM. This eliminates the requirement that 
devices accessed by ADSM be physically located on the same node as the 
ADSM server. 
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11.6 Understanding CLIO/S 

IBM Client Input Output/Sockets (CLIO/S) is a set of commands and 
application programming interfaces (API) that can be used for high-speed 
communication and for accessing tape devices on a network of AIX 
workstations and MVS mainframes. CLIO/S makes it easier to distribute work 
and data across a network of mainframes, workstations, and RS/6000 SP 
systems. CLIO/S also provides an API to tape drives anywhere in your 
network. CLIO/S can be used to: 

• Quickly move data between your MVS/ESA system and your workstation 
(or SP). For example, you can store large volumes of seismic data on tape 
and manage it using a mainframe acting as a data server to multiple 
workstations. This solution retains tape management as the responsibility 
of a single mainframe system while permitting seismic processing capacity 
to increase by distributing the work. 

• Transfer very large files. For example, you can use applications on AIX to 
update customer files during the day, then use CLIO/S for fast backups to 
take advantage of MVS as a file server with extensive data management 
capabilities. Using CLIO/S for frequent file copying can mean shorter 
interruptions to your ongoing applications. 

• Transfer files using familiar workstation commands. The CLIO/S CLFTP 
subcommands are similar to those of TCP/IP's ftp command; so, there's 
no need for users to learn a new interface. Users can even access tape 
data on MVS with the CLFTP subcommands. 

• Access a tape drive on MVS from your workstation as though it were a 
local tape drive. For example, you can store data on MVS controlled tape 
drives and access it using CLIO/S connections to the compute servers. 

• Start servers on other workstations and mainframes in your network to 
create a parallel processing environment. For example, CLIO/S can be 
used to schedule work on several workstations running in parallel. It also 
provides high data transfer rates and low processor utilization permitting 
very high, parallel efficiency. 

• Use AIX named pipes and BatchPipes/MVS. For example, you can access 
data on MVS (either on DASD or on tape) with an AIX named pipe. Or, an 
MVS program can use an MVS BatchPipe to send its output to AIX where 
another program using an AIX named pipe can do further processing. 
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11.7 Related documentation 

The concepts that need to be understood in this section are not the ones 
related to installation or configuration, but a general understanding of the 
functionality of these products is advised. 

SP Manuals 

Product manuals are very helpful for installing, configuring, and managing 
these products. If you are interested in installing and configuring these 
products, you can consult the product manuals listed in Appendix D, “Related 
publications” on page 501. 

SP Redbooks 

There are many redbooks that cover each one of these products in great 
detail. However, since the idea is to get an understanding only, we 
recommend the redbook Inside the RS/6000 SP, SG24-5145. This book 
covers most of the products in more detail than they appear here; so, this 
redbook may be useful if you want to explore the product in greater depth. 


11.8 Sample questions 

This section provides a series of questions to help aid you in preparation for 
the certification exam. The answers to these questions can be found in 
Appendix A. 

1. In planning for the use of LoadLeveler to run periodic batch jobs across 
several nodes, one requirement that is key to the use of LoadLeveler 
states that a flat UID namespace is required across all nodes in a 
LoadLeveler cluster. Why is this? 

A. LoadLeveler runs different jobs from a variety of client machines to a 
number of server machines in the defined LoadLeveler Cluster and, 
due to standard UNIX security requirements, must be able to depend 
on the UID being consistent across all to nodes defined to the Cluster. 

B. If such a namespace is not established, LoadLeveler will not be able to 
properly distinguish one UID from another, which may disrupt its 
capabilities for managing parallel jobs. 

C. LoadLeveler runs different jobs from a variety of client machines to a 
number of server machines in the defined Loadleveler Cluster and, due 
to standard hostname resolution differences between machines, 
depends on the /etc/hosts file being present even if DNS is 
implemented. 
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D. A flat UID namespace is optional, but more efficient load-balancing can 
be achieved using this approach. 

2. An HACWS environment requires which of the following to connect the two 
CWSs to the frame? 

A. An SCSI Target Mode Cable. 

B. An additional Ethernet adapter for the frame supervisor card. 

C. A Y-cable to link the two serial cables to the one port. 

D. A null-modem cable. 

3. In a HACWS environment, if the primary control workstation fails, the 
backup CWS assumes all functions of the primary CWS. Which of the 
following functions is an exception to the previous statement? 

A. Authentication server 

B. Boot/install server 

C. Hardware monitoring 

D. Adding or changing SP users 


11.9 Exercises 

Here are some exercises you may wish to perform: 

1 .Familiarize yourself with the key characteristics of NetTAPE and CLIO/S. 

2. Explore the workflow characteristics of LoadLeveler. 

3. What capabilities does PTPE provide? 

4. What are the components of the HACWS? 
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Chapter 12. Application-specific resources 


Once PSSP has been configured and installed, you may need to install and 
configure additional products before you may start using your applications. 
These products, although some of them are not part of PSSP, are usually 
installed and configured in RS/6000 SP environments. 

This chapter provides the basic concepts and setup procedures for 
understanding, installing, and configuring additional RS/6000 SP products. 


12.1 Key concepts you should study 

Before taking the exam, make sure you understand the following concepts: 

• How the IBM Virtual Shared Disk works and what solution it provides. 

• What are the filesets that are part of the VSD packaging and where they 
should be installed? 

• How you create and configure VSD nodes and disks. 

• How you manage VSD nodes and disks. 

• How the IBM Recoverable Virtual Shared Disk works and what solution it 
provides. 

• What are the hardware prerequisites for installing and configuring RVSD? 

• What are the filesets that are part of the RVSD packaging and where they 
should be installed? 

• How you set up and manage a RVSD environment. 

• What is a General Parallel File System (GPFS)? 

• What are the hardware prerequisites for installing and configuring GPFS? 

• How you configure and manage GPFS. 

• What is NetTAPE? 


12.2 IBM Virtual Shared Disks 

The IBM Virtual Shared Disks (VSD) allows data stored in logical devices 
(logical volumes) to be access transparently from remote nodes. VSD is a 
thin layer of software that runs between the logical device and the Logical 
Volume Manager (LVM) as shown in Figure 121 on page 328. 
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In Figure 121, there are two logical devices (lv_X and lv_Y). Each one is 
owned by Node X and Node Y, respectively. When applications in Node X 
need to access lv_X, they will go through the logical volume manager as 
usual for local access. However, when they need to access lv_Y, which is 
remote, the VSD layer will take the requirement and ship it through a TCP/IP 
network (in this case, the SP Switch) to the disk server for lv_Y. For the 
application on Node X, both accesses were of the same kind (both access 
special devices named /dev/lv_X and /dev/lv_Y, respectively). 

The nodes that manage physical disks are called VSD Server , and those that 
only access VSD disks are called VSD Clients. A VSD Server can be a VSD 
client. 

In order to use VSD, it is necessary to install the VSD filesets on all the nodes 
that are going to be using or managing VSD disks. The VSD filesets have 
changed from PSSP 2.4 to PSSP 3.1, as shown in Table 26. 

Table 26. VSD filesets 


PSSP 2.4 or below 

PSSP 3.1 

Description 

ssp.csd.cmi 

vsd.cmi 

VSD SMIT Panels 

ssp.csd.vsd 

vsd.vsdd 

VSD Device Driver 

ssp.csd.hsd 

vsd.hsd 

VSD Hash Shared Disk 

ssp.csd.sysctl 

vsd.sysctl 

VSD Sysctl Commands 

ssp.csd.gui 

ssp.vsdgui 

VSD Perspective 
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PSSP 2.4 or below 

PSSP 3.1 

Description 

ssp.csd.loc.ma_RP.gui 

ssp.vsdgui. msg.ma_RP 

VSD Perspective 

Locale Information 

ssp.csd.msg.ma_RP.gui 

ssp.vsdgui. msg.ma_RP 

VSD Perspective 

Messages 


12.2.1 Installing IBM Virtual Shared Disk 

Before you install VSD into your nodes and control workstation, make sure 
that you are using the right level of AIX and PSSP. The VSD components 
have some prerequisites in terms of AIX and PSSP level as described in 
8.6.1, “PSSP prerequisites” on page 258. 

The filesets involved are: 

For the IBM Virtual Shared Disk component: 

• vsd.vsdd 

• vsd.sysctl 

• vsd.cmi 

For the Hashed Shared Disk (HSD) component: 

• vsd.hsd 

If you are working with PSSP level older than PSSP 2.4, make the conversion 
to the correspondent fileset according to Table 26. 

- Note - 

The IBM Virtual Shared Disk Perspective component is in ssp.vsdgui. The 
PostScript file for VSD manual and the man pages for the related 
commands are contained in ssp.docs. They are in the ssp install image, 
which should be installed on the control workstation. 


The filesets to be installed are as follows: 

On the control workstation: 

• vsd.vsdd 

• vsd.sysctl 

• vsd.cmi 

• ssp.vsdgui (if you want to use the VSD Perspective) 
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On the VSD client and server nodes: 

• vsd.vsdd 

• vsd.sysctl 

If you are going to use HSD, then on HSD server and client nodes: 

• vsd.hsd 

12.2.2 Establishing authorization 

The IBM Virtual Shared Disk component uses syscti for configuration and 
management. Your Kerberos principal has to be listed in the VSD ACL file in 
order to execute any VSD configuration command. The file /etc/sysctl.vsd.acl 
is shown in Figure 122. 


#acl# 

# These are the users that can issue sysctl_vsdXXX command on this node 

# Name must have a Kerberos name format which defines user@realm 

# Please check your security administrator to fill in correct realm name 

# you may find realm name from /etc/krb.conf 

# _PRINC I PAL root@PPD.POK.IBM.COM 
_PRINCIPAL root.admin@MSC.ITS0.IBM.COM 

# _PRINC I PAL rcmd@PPD.POK.IBM.COM 

# PRINCIPAL userid@PPD.POK.IBM.COM 


Figure 122. The syscti. vsd.ad file 

This file should be copied to all the nodes where VSD has been installed. 
Once copied, check that you have authorization to the VSD nodes. 

To check your syscti authorization, first run the klist command to look at 
your ticket and then run the syscti whoami command and compare both: 

[sp3en0:/]# klist 
Ticket file: /tmp/tktO 

Principal: root.admin@MSC.ITSO.IBM.COM 


Issued Expires Principal 


Dec 

4 

14:34:30 

Jan 

3 

14:34:30 

krbtgt.MSC.ITSO.IBM.COM@MSC.ITSO.IBM.COM 

Dec 

4 

14:43:22 

Jan 

3 

14:43:22 

rcmd.sp3enO@MSC.ITSO.IBM.COM 

Dec 

4 

14:43:43 

Jan 

3 

14:43:43 

hardmon.sp3 enO@MSC.ITSO.IBM.COM 

Dec 

4 

14:56:04 

Jan 

3 

14:56:04 

rcmd.sp3n01@MSC.ITSO.IBM.COM 

Dec 

4 

14:56:04 

Jan 

3 

14:56:04 

rcmd.sp3n05@MSC.ITSO.IBM.COM 
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Dec 

4 

14:56:04 

Jan 

3 

14:56:04 

rcmd.sp3n06@MSC.ITSO.IBM.COM 

Dec 

4 

14:56:04 

Jan 

3 

14:56:04 

rcmd.sp3n08@MSC.ITSO.IBM.COM 

Dec 

4 

14:56:04 

Jan 

3 

14:56:04 

rcmd.sp3n07@MSC.ITSO.IBM.COM 

Dec 

4 

14:56:04 

Jan 

3 

14:56:04 

rcmd.sp3n09@MSC.ITSO.IBM.COM 

Dec 

4 

14:56:04 

Jan 

3 

14:56:04 

rcmd.sp3nlO@MSC.ITSO.IBM.COM 

Dec 

4 

14:56:05 

Jan 

3 

14:56:05 

rcmd.sp3nll@MSC.ITSO.IBM.COM 

Dec 

4 

14:56:05 

Jan 

3 

14:56:05 

rcmd.sp3nl3@MSC.ITSO.IBM.COM 

Dec 

4 

14:56:05 

Jan 

3 

14:56:05 

rcmd.sp3nl2@MSC.ITSO.IBM.COM 

Dec 

4 

14:56:05 

Jan 

3 

14:56:05 

rcmd.sp3nl4@MSC.ITSO.IBM.COM 

Dec 

4 

14:56:05 

Jan 

3 

14:56:05 

rcmd.sp3nl5@MSC.ITSO.IBM.COM 


[sp3en0:/]# sysctl whoami 

root.admin@MSC.ITSO.IBM.COM 

To check that you can run VSD multinode commands, use the following 
command: 

[sp3en0:/]# vsdsklst -n 1,15 
» sp3n01.msc.itso.ibm.com 

Node Number:1; Node Name:sp3n01.msc.itso.ibm.com 

Volume group:rootvg; Partition Size:4; Total:537; Free:233 
Physical Disk:hdiskO; Total:537; Free:233 
Not allocated physical disks: 

Physical disk:hdiskl; Total:2.2 

<< 

» sp3nl5.msc.itso.ibm.com 

Node Number:15; Node Name:sp3nl5.msc.itso.ibm.com 

Volume group:rootvg; Partition Size:4; Total:958; Free:665 
Physical Disk:hdiskO; Total:479; Free:311 
Physical Disk:hdisk3; Total:479; Free:354 
Not allocated physical disks: 

Physical disk:hdiskl; Total:2.0 
Physical disk:hdisk2; Total:2.0 

<< 

This command lists information about physical and logical volume manager 
as seen by the IBM Virtual Shared Disk software. 

In this case, VSD have been installed and configured in node 1 and node 15. 

12.2.3 Configuring 

At this point in the installation, you are required to define and enter disk 
parameters for the VSD nodes into the System Data Repository (SDR). 

This can be done through the vsdnode command or the IBM Virtual Shared 
Disk Perspective graphical interface (spvsd command). The syntax for the 
vsdnode command is as follows: 
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Usage: vsdnode node_number ... adapter_name init_cache_buffer_count 
max_cache_buffer_count vsd_request_count rw_request_count 
min_buddy_buffer_size max_buddy_buffer_size max_buddy_buffers 
VSD_maxIPmsgsz 

For example, to define and configure nodes 1 and 15, we should run the 
following command: 

vsdnode 1 15 cssO 256 256 256 48 4096 262144 2 61440 

Or, we may use the VSD Perspective as shown in Figure 123. 
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Figure 123. IBM Virtual Shared Disk perspective 

Once the nodes have been designated, we can start creating VSD disks on 
the designated nodes. To create a VSD disk, you have to decide first which 
volume group your are going to use. It can be rootvg or a global volume group 
you have previously created. 

Volume groups used for virtual shared disks must be given a global name 
that is unique across system partitions. 

This tasks is always done, but you do not have to always perform it. The 
Create... actions and the comparable createvsd and createhsd commands do 
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this for you. You only have to do this explicitly if you need to use the Define... 
action or the defvsd command to create your virtual shared disks because you 
already have logical volumes. 

You can use the Run Command... action and run the vsdvg command to 
define global volume groups. 

12.2.4 Creating virtual shared disks 

Remember, your procedure is based on whether or not you already have 
logical volumes. The Create... actions and commands take care of logical 
volumes and global volume groups for you. If you already have them, you 
must do the define steps instead. 

If you are using VSD to create the logical volumes and define the global 
volume groups for you, then it is a good idea to check old rollback files. Refer 
to PSSP: Managing Shared Disks, SA22-7349, for details on how to check old 
rollback files. 

You can create a virtual shared disks with the graphical user interface action 
or the command line (on both primary and secondary nodes if you have the 
IBM Recoverable Virtual Shared Disk component running). You must have 
first used the IBM Virtual Shared Disk Perspective or the vsdnode command to 
set up information in the SDR about each node involved in this virtual shared 
disk configuration. 

To create virtual shared disk using the IBM Virtual Shared Disk Perspective, 
launch the graphical interface using the spvsd command. Figure 124 on page 
334 shows the initial start-up window. 

In the main window, select the View->Add Panel as shown Figure 125 on 
page 335. Once the new panel has been added, you can create virtual shared 
disks by selecting the Create... option from the Action menu. 
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IBM Virtual Shared Disk Perspective - sp3en0 : sp3en0 
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Figure 124. IBM Virtual Shared Disk perspective (spvsd) 

When creating virtual shared disks, you have to enter the pertinent 
information in the dialog box or as arguments to the createvsd command. 


The information you need to enter is: 

• The number of IBM VSDs per node. 

• The IBM VSD name prefix. 

• The logical volume name prefix. 

• The volume group name. 

• The IBM VSD size (MB). 
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• The mirroring count. 

• The physical partition size (MB). 

• Select the nodes that have been designated as virtual shared disk nodes, 
the primary node, and the backup node. 

• Select the physical disks that the virtual shared disk is to span. 


View Options 



Filter,.. 

Remove Filter 
Select fill 
Deselect All 
Add Pane..* 

Delete Current Pane 
Change Current Pane Title... 
Set Object Label... 

Show Objects in Table View 
Set Table Attributes... 

Change System Partition... 

Set Monitoring... 

Acknowledge Honitorirsg St at e 
Show Objects as Small Icons 
Do Not Show Nodes in SP Frames 
Hide Frame Labels 


Filter to She* 

Related 

IBM VSDs 

F i iter to Shc« 
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F liter to She* 

Relate.:! 
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Figure 125. Adding a VSD pane 

The window for creating virtual shared disks is shown in Figure 126 on page 
336. 

If you prefer the command line interface instead, you can use the createvsd 
command as follows: 

createvsd -n 1,15 -s 4 -g ITSOVG -v ITSOVSD 


Pane type: IBM VSDs 

Add pane to: . ./Current window 


|(* New window 


Pane title: 


Ok 


IBM VSDs:1 


Apply 


Cancel 


Help 
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IBM VSB name prefix: 

Logical volume name prefix: 
Volume group Name: 

IBM VSB size (MB): 

Mirroring count: 

Physical partition size (MB): 
Cache option: 


4 
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Aj 

/J 


|Aj 

Ld 

|Al 
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Figure 126. Creating virtual shared disks 


This creates the following virtual shared disk definitions: 

• ITSOVSDnl on node 1. The local volume group name on node 1 is 
ITSOVG. The global volume group name is ITSOVGnl. The logical 
volume is IvlTSOVSDnl. 

• ITSOVSDn15 on node 15. The local volume group name on node 15 is 
ITSOVG. The global volume group name is ITSOVGnl5. The logical 
volume is lvlTSOVSDn15. 


This can be seen from the two nodes we have just configured and created the 
virtual shared disks: 
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[sp3en0:/usr/lpp/vsd]# dsh -w sp3n01,sp3nl5 lsvg 

sp3n01: rootvg 

sp3n01: ITSOVG 

sp3nl5: rootvg 

sp3nl5: ITSOVG 


[sp3en0:/usr/lpp/vsd]# dsh 

-w sp3n01,sp3n!5 

lsvg 

-1 ITSOVG 


sp3n01 

ITSOVG: 







sp3n01 

LV NAME 

TYPE 

LPs 

PPs 

PVs 

LV STATE 

MOUNT POINT 

sp3n01 

lvITSOVSDlnl 

jfs 

1 

1 

1 

closed/syncd 

N/A 

sp3nl5 

ITSOVG: 







sp3nl5 

LV NAME 

TYPE 

LPs 

PPs 

PVs 

LV STATE 

MOUNT POINT 

sp3n!5 

lvITSOVSD2n!5 

jfs 

1 

1 

1 

closed/syncd 

N/A 


No secondary nodes are defined. The space allocated to a virtual shared disk 
is spread across all the physical disks (hdisks) within its local volume group 
on each node (1 and 15). 

To assign each disk in the previous example a secondary node (with the IBM 
Recoverable Virtual Shared Disk component running), type: 

createvsd -n 1/5/,15/6/ -s 4 -g ITSOVG -v ITSOVSD 

This creates the following virtual shared disk definitions: 

• ITSOVSDnl on node 1 with a twin-tailed connection to node 5. The local 
volume group name on node 1 is ITSOVG. The global volume group name 
is ITSOVGnl. The logical volume is IvlTSOVSDI nl. 

• ITSOVSDnl5 on node 15 with a twin-tailed connection to node 6. The 
local volume group name on node 15 is ITSOVG. The global volume group 
name is ITSOVGn15. The logical volume is lvlTSOVSD2n15. 

After you have created your virtual shared disks, you must configure them on 
all nodes that need to read from and write to them. 

If you want recoverability, you should also have installed the IBM 
Recoverable Virtual Shared Disk software on each virtual shared disk node. 
In this case, you can use the actions from the Nodes panel Control IBM 
RVSD subsystem..., which will automatically configure and activate all the 
virtual shared disks as soon as quorum is met and activates recoverability on 
all the virtual shared disk nodes after you set the state to Initial Reset. If you 
prefer to use the command ha_vsd reset, you must run it on each virtual 
shared disk node. 

To configure all the virtual shared disks, you can use the IBM Virtual Shared 
Disk Perspective (spvsd) graphical interface, or you can use the command 
cfgvsd. From the graphical interface, select the nodes you want to configure 
and then select Configure IBM VSDs... from the Actions menu. Figure 127 
on page 338 shows the graphical window for configuring the virtual shared 
disks we previously defined. 
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Figure 127. Configuring virtual shared disks 


12.2.5 Changing States of virtual shared disks 

After your virtual shared disks are configured, they are put into a stopped 
state. Before they can be of any good, you have to start them. If you are using 
the IBM Recoverable Virtual Shared Disk component, then this step is done 
automatically. 

To check the status of your virtual shared disks, you may use the lsvsd 
command as follows: 

# lsvsd -1 

minor state server lv_major lv_minor vsd-name option 

size(MB) 

1 ACT 1 34 1 ITSOVSDlnl nocache 4 

2 ACT 15 0 0 ITSOVSD2nl5 nocache 4 

The state column represents the state of the virtual shared disk. 

Before you start your virtual shared disks, you have put the virtual shared 
disks you just configured into a suspended state. To do this, you use the 
preparevsd comm and. Once the virtual shared disks are in a suspended state, 
you can use the resumevsd command to make them active. 

Figure 128 on page 339 shows all the possible states of a virtual shared disk 
and the transitions between states. 
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Figure 128. Virtual shared disk states and associated commands 


12.3 IBM Recoverable Virtual Shared Disks 

Recoverable Virtual Shared Disk (RVSD) adds availability to VSD. RVSD 
allows you to twin-tail disks, that is, physically connect the same group of 
disks to two or more nodes and provide transparent failover of VSDs among 
the nodes. 
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Figure 129. RVSD function 


With reference to Figure 129, Nodes X, Y, and Z form a group of nodes using 
VSD. RVSD is installed on Nodes X and Y to protect VSDs rvsd_X and 
rvsd_Y. Nodes X and Y physically connect to each other’s disk subsystem 
where the VSDs reside. Node X is the primary server for rvsd_X and the 
secondary server for rvsd_Y and vice versa for Node Y. Should Node X fail, 
RVSD will automatically failover rvsd_X to Node Y. Node Y will take 
ownership of the disks, vary-on the volume group containing rvsd_X, and 
make the VSD available. Node Y serves both rvsd_X and rvsd_Y. Any I/O 
operation that was in progress and new I/O operations against rvsd_X are 
suspended until failover is complete. When Node X is repaired and rebooted, 
RVSD switches the rvsd_X back to its primary Node X. 


RVSD subsystems are shown in Figure 130 on page 341. The rvsd daemon 
controls recovery. It invokes the recovery scripts whenever there is a change 
in the group membership. When a failure occurs, the rvsd daemon notifies all 
surviving providers in the RVSD node group so that they can begin recovery. 
Communication adapter failures are treated the same as node failures. 

The he daemon is also called the Connection Manager. It supports the 
development of recoverable applications. The he daemon maintains a 
membership list of the nodes that are currently running he daemons and an 
incarnation number that is changed every time the membership list changes. 
The he daemon shadows the rvsd daemon recording the same changes in 
state and management of VSD that rvsd records. The difference is that he 


340 


IBM Certification Study Guide RS/6000 SP 
































only records these changes after rvsd processes them to assure that RVSD 
recovery activities begin and complete before the recovery of he client 
applications takes place. This serialization helps ensure data integrity. 
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Figure 130. RVSD subsystems and HAI 


12.4 General Parallel File Systems 

GPFS provides a standard, robust file system for serial and parallel 
applications on the SP. From a user’s view, it resembles NFS, but unlike 
NFS, the GPFS file system can span multiple disks on multiple nodes. GPFS 
exploits VSD technology and the Kerberos-based security features of the SP 
and, thus, is only supported on SP systems. 

A user sees a GPFS file system as a normal file system. Although it has its 
own support commands, usual file system commands, such as mount and df, 
work as expected on GPFS. GPFS file systems can be flagged to mount 
automatically at boot time. GPFS supports relevant X/OPEN standards with a 
few minor exceptions. Large NFS servers, constrained by I/O performance, 
are likely candidates for GPFS implementations. 

GPFS is implemented as kernel extensions, a multi-threaded daemon, and a 
number of commands. The kernel extensions are needed to implement the 
virtual file system layer that presents a GPFS file system to applications as a 
local file system. In the first version of GPFS (GPFS vl.1), the locking 
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mechanism, also called token management, was implemented as a kernel 
extension. In the second version currently available (GPFS vl.2), the token 
management facility has been moved to user space as part of the GPFS 
daemon (mmfsd). 

The multi-threaded daemon provides specific functions within GPFS. 
Basically, the daemon provides data and metadata management (such as 
disk space allocation, data access, and disk I/O operations). It also provides 
security and quota management. 

The GPFS daemon runs on every node participating in the GPFS domain and 
may take on different personalities. Since GPFS is not the client-server type 
of file system, as NFS or AFS may be seen, it uses the concept of VSD 
servers, which are nodes physically connected to disks. Each node running 
GPFS (including VSD servers) will use the virtual shared disk extensions to 
access the data disks. 

GPFS works within a system partition, and the node in this partition running 
GPFS will be able to access any defined GPFS file system. In order to access 
the file systems created in GPFS, nodes need to mount them like any other 
file system. To mount the file systems, nodes have two options: 

• Nodes running GPFS 

For these nodes, mounting a GPFS file system is the same as mounting 
any local (JFS) file system. The mounting has no syntax difference with 
the local mounting done with JFS. At creation time, GPFS file systems can 
be set to be mounted automatically when the nodes start up. 

• Nodes not running GPFS 

For these nodes, GPFS file system can be made available through NFS. 
Nodes running GPFS, and after mounting the file systems, can NFS 
export them. The same applies to any NFS-capable machine. 

12.4.1 Requirements 

GPFS environment is specific to AIX on the RS/6000 SP. Various software 
requirements must be installed and configured correctly before you can 
create a GPFS file system. 

12.4.1.1 Hardware requirements 

GPFS runs only on the RS/6000 SP, and the switch must be installed and 
configured. Although GPFS does not required twin-tailed or SSA loops of 
disks, it is recommended to install such configurations in order to provide 
higher data availability at the hardware level. 
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12.4.1.2 Software requirements 

There are two versions of GPFS available at the time this redbook is being 
written. GPFS vl .1 requires PSSP 2.4, which requires AIX 4.2.1 or AIX 4.3. 
GPFS vl .2 requires PSSP 3.1, which, in turn, requires AIX 4.3.2. 

GPFS also requires the IBM Virtual Shared Disk and the IBM Recoverable 
Virtual Shared Disk products, which level are defined by the level of PSSP 
installed. So, if PSSP 2.4 is installed, VSD and RVSD Version 2.1.1 are 
required. If PSSP 3.1 is used, then VSD and RVSD 3.1 are required. 

GPFS requires RVSD even though your installation does not have twin-tailed 
disks or SSA loops for multi-host disk connection. 

12.4.2 Configuring GPFS 

Chapter 2 of General Parallel File System for AIX: Installation and 
Administration Guide, SA22-7278, is devoted to a series of steps in planning 
GPFS. It is recommended that this section be read and understood prior to 
installing and using GPFS. 

GPFS tasks cannot be done on the CWS; they must be performed on one of 
the GPFS nodes. 

There are three areas of consideration when GPFS is being setup: The nodes 
using GPFS, the VSDs to be used, and the FS to be created. Each area is 
now examined. A sample FS setup consisting of four nodes is provided. 
Nodes 12, 13, and 14 are GPFS nodes, while node 15 is the VSD server 
node. 


-Note- 

Do not attempt to start the mmfsd daemon prior to configuring GPFS. 
Starting the mmfsd daemon without configuring GPFS causes dummy 
kernel extensions to be loaded, and you will be unable to create a FS. If 
this occurs, configure GPFS and then reboot the node(s). 

Carry out the following procedures to configure GPFS, then start the 
mmfsd daemon to continue creating the FS. 


Nodes 

The first step in setting up GPFS is to define which nodes are GPFS nodes. 
The second step is to specify the parameters for each node. 
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There are three areas where nodes can be specified for GPFS operations: 
Node count, node list, and node preferences. 

The Node Count is an estimate of the maximum number of nodes that will 
mount the FS and is entered into the system only when the GPFS FS is 
created. It is recommended to overestimate this number. This number is used 
in the creation of GPFS data structures that are essential for achieving the 
maximum degree of parallelism in file system operations. Although a larger 
estimate consumes a bit more memory, insufficient allocation of GPFS data 
structures can limit a node’s ability to process certain parallel requests 
efficiently, such as the allotment of disk space to a file. If it is not possible to 
estimate the number of nodes, apply the default value of 32. A larger number 
may be specified if more nodes are expected to be added. However, it is 
important to avoid wildly overestimating since this can affect buffer 
operations. This value cannot be changed later. The FS must be destroyed 
and re-created. 

A node list is a file that specifies to GPFS the actual nodes to be included in 
the GPFS domain. This file may have any file name. However, when GPFS 
configures the nodes, it copies the file to each GPFS node as 
/etc/cluster.nodes. The GPFS nodes are listed one per line in this file, and the 
switch interface is to be specified because this is the interface over which 
GPFS runs. 

Figure 131 on page 345 is an example of a node list file. The file name in this 
example is /var/mmfs/etc/nodes.list. 
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Having chosen the nodes that form the GPFS domain, there is the option to 
choose which of these nodes are to be considered for the personality of stripe 
group manager. There are only three nodes in the GPFS domain in this 
example; so, this step is unnecessary. However, if there are a large number 
of nodes in the GPFS domain, it may be desirable to restrict the role of stripe 
group manager to a small number of nodes. This way, if something happens 
and a new stripe group manager has to be chosen, GPFS can do so from a 
smaller set of the nodes (the default is every GPFS node). To carry this out, 
follow the format for creating a node list to create the file 
/var/mmfs/etc/cluster.preferences (this file name must be followed). 

To configure GPFS, you can use SMIT panels or the mmconfig command. The 
mmconfig command is further described in General Parallel File System for 
AIX: Installation and Administration Guide, SA22-7278. The SMIT panel 
maybe accessed by typing smit gpfs and then selecting the Create Startup 
Configuration option. Figure 132 on page 346 shows the SMIT panel used to 
configure GPFS (this is being run on node 12 in our example). This step 
needs to be run on only one node in a GPFS domain. 
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Figure 132. SMIT panel for configuring GPFS 


It is possible to configure GPFS to automatically start on all nodes whenever 
they come up. Simply specify yes to the autoload option in the SMIT panel or 
the -a flag in the mmconfig command. This eliminates the need to manually 
start GPFS when nodes are rebooted. 

The pagepooi and maiiocsize options specify the size of the cache on each 
node dedicated for GPFS operations, maiiocsize sets an area dedicated for 
holding GPFS control structures data, while pagepooi is the actual size of the 
cache on each node. In this instance, pagepooi is specified to the default size 
of 4 M while maiiocsize is specified to be the default of 2 M, where M stands 
for megabytes and must be included in the field. The maximum values per 
node are 512 MB for pagepooi and 128 MB for maiiocsize. 

The priority field refers to the scheduling priority for the mmfsd daemon. The 
concept of priority is beyond the scope of this redbook. Please refer to AIX 
documentations for more information. 

Notice the file /usr/lpp/mmfs/samples/mmfs.cfg.sample. This file contains the 
default values to be used to configure GPFS if none are specified either 
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through the fields in the SMIT panel or in another file. The use of another file 
to set GPFS options may appeal to more experienced users or those who 
want to configure multiple GPFS domains with the same parameters. Simply 
copy this file (/usr/lpp/mmfs/samples/mmfs.cfg.sample) to a different file, 
make the changes according to your specifications, propagate it out to the 
nodes, and configure it using SMIT or the mmconfig command. 

Further information, including details regarding the values to set for pagepooi 
and maiiocsize, is available in the manual General Parallel File System for 
AIX: Installation and Administration Guide, SA22-7278. 

Once GPFS has been configured, mmfsd has to be started on the GPFS 
nodes before a FS can be created. Here are the steps to do so: 

1. Set the WCOLL environment variable to target all GPFS nodes for the dsh 
command. PSSP: Administration Guide, SA22-7348, PSSP: Command 
and Technical Reference, SA22-7351, and IBM RS/6000 SP Management, 
Easy, Lean, and Mean, GG24-2563, all contain information on the WCOLL 
environment variable. 

2. Designate each of the nodes in the GPFS domain as an IBM VSD node. 

3. Ensure that the rvsd and he daemons are active on the GPFS nodes. 

-Note- 

It is necessary to have set up at least one VSD. The rvsd and he do not 
start unless they detect the presence of one VSD defined for the GPFS 
nodes. This VSD may or may not be used in the GPFS FS; the choice is up 
to you. 


4. Start the mmfsd daemon by running it on one GPFS node: 

dsh startsrc -s mmfs 

The mmfsd starts on all the nodes specified in the /etc/cluster.nodes file. If 
the startup is successful, the file /var/adm/ras/mmfs.log* looks like Figure 
133 on page 348. 
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Figure 133. Sample output of /var/adm/ras/mmfs.log* 


VSDs 

Before the FS can be created, the underlying VSDs must be set up. The 
nodes with the VSDs configured may be strictly VSD server nodes, or they 
can also be GPFS nodes. The application needs to be studied, and a decision 
needs to be made as to whether the VSD server nodes are included in the 
GPFS domain. 

A decision also needs to be made regarding the level of redundancy used to 
guard against failures. Should the VSDs be mirrored? Should they run with a 
RAID subsystem on top? Should RVSD be used in case of node failures? 
Again, this depends on the application, but it can also depend on your 
comfort and preferences with dealing with risk. 

In addition to these options, GPFS provides two further recovery strategies at 
the VSD (disk) level. 

GPFS organizes disks into a number of failure groups. A failure group is 
simply a set of disks that share a common point of failure. A common point of 
failure is defined as that which, if it goes down, causes the set of disks to 
become simultaneously unavailable. For example, if a VSD spans two 
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physical disks within one node, the two disks can be considered a failure 
group because if the node goes down, both disks become unavailable. 

Recall that there are two types of data that GPFS handles: Metadata and the 
data itself. GPFS can decide what is stored into each VSD: Metadata only, 
data only, or data and metadata. It is possible to separate metadata and data 
to ensure that data corruption does not affect the metadata and vice versa. 
Further, it can impact performance. This is best seen if RAID is involved. 
RAID devices are not suited for handling metadata because metadata is 
small in size and can be handled using small I/O block sizes. RAID is most 
effective at handling large I/O block sizes. Metadata can, therefore, be stored 
in a non-RAID environment, such as mirrored disks, while the data can be 
stored in a RAID disk. This protects both data and metadata and maximizes 
the performance chosen by RAID. 

Once the redundancy strategy has been adopted, there are two choices to 
creating VSDs: Have GPFS do it for you or manually create them. Either way, 
this is done through the use of a Disk Descriptor file. This file can be manually 
set up or done through the use of SMIT panels. If using SMIT, run smit gpfs 
and then select the Prepare Disk Descriptor File option. Figure 134 on page 
350 shows the SMIT panel for our example. 

In this case, the VSD vsd1n15 has already being created on node 15 
(sp3n15). Do nofspecify a name for the server node because the system has 
all of the information it needs from the configuration files in the SDR. In 
addition, the VSD(s) must be in the Active state on the VSD server node and 
all the GPFS nodes prior to the GPFS FS creation. 

If the VSDs have not been created, specify the name of the disk (such as 
hdisk3) in the disk name field instead of vsd1n15 and specify the server 
where this hdisk is connected. GPFS then creates the necessary VSDs to 
create the FS. 

The failure group number may be system generated or user specified. In this 
case, a number of 1 is specified. If no number is specified, the system 
provides a default number that is equal to the VSD server node number + 
4000. 
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Figure 134. SMIT panel for Creating Disk Descriptor file 

File system 

There are two ways to create a GPFS FS: Using SMIT panels or the mmcrfs 
command. Figure 135 on page 351 shows the SMIT panel to be used. This is 
accessed by running smit gpfs and then selecting the Create File System 
option. Details on mmcrfs can be found in General Parallel File System for 
AIX: Installation and Administration Guide, SA22-7278. 
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Figure 135. SMIT panel for creating a GPFS FS 

Before creating the FS, several decisions have to be made: 

• Decide how to structure the data in the FS. 

There are three factors to consider for structuring the data in the FS: Block 
size, i-node size, and indirect block size. 

GPFS offers a choice of three block sizes for I/O to and from the FS: 16 
KB, 64 KB, or 256 KB. Familiarity with the applications running on your 
system will help you determine which block size to use. If the application 
handles large amounts of data in a single read/write operation, then a 
large block size may prove more suitable. If the size of the files handled by 
the application is small, a smaller block size may be more suitable. The 
default is 256 KB. 

GPFS further divides each block of I/Os into 32 sub-blocks. If the block 
size is the largest amount of data that can be accessed in a single I/O 
operation, the sub-block is the smallest unit of disk space that can be 
allocated to a file. For a block size of 256 KB, GPFS reads as much as 256 
KB of data in a single I/O operation, and a small file can occupy as little as 
8 KB of disk space. 
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Files smaller than one block size are stored in fragments, which are made 
up of one or more sub blocks. Large files are, therefore, often stored in a 
number of full blocks plus one or more fragments to hold the data at the 
end of the file. 

The i-node is also known as the file index. It is the internal structure that 
describes an individual file to AIX holding such information as file size and 
the time of the last modification to the file. In addition, an i-node points to 
the location of the file on the hard disk. If the file is small, the i-node stores 
the addresses of all the disk blocks containing the file data. If the file is 
large, i-nodes point to indirect blocks that point to the disk blocks storing 
the file data (indirect blocks are set aside to specifically only hold data 
block addresses). 

The default size of an i-node is 512 bytes. This number can increase to 4 
KB depending on the size of the files the application uses. 

An indirect block can be as small as a single sub-block or as large as a full 
block (up to an absolute maximum of 32 KB). The only additional 
requirement is that the value of an indirect block is a multiple of the size of 
a sub-block. 

It is also possible to specify the number of i-nodes to limit the maximum 
number of files that can be created in the FS. In older versions of GPFS, 
the maximum number of i-nodes is set at GPFS FS creation time and 
cannot be changed after. At GPFS vl .2, it is now possible to set a limit at 
FS creation time, and if it proves necessary, change this upper limit. The 
upper limit is changed by the mmchfs command, and the exact syntax can 
be found in General Parallel File System for AIX: Installation and 
Administration Guide, SA22-7278. 

• Decide the striping method. 

GPFS automatically stripes data across VSDs to increase performance 
and balance disk I/O. There are three possible striping algorithms that you 
can choose for GPFS to implement: Round Robin, balanced Random, and 
Random. A striping algorithm may be set when a GPFS FS is first created 
or can be modified as a FS parameter later on. 

The three algorithms are now detailed: 

• Round Robin 

This is the default option the system chooses. Data blocks are written 
to one VSD at a time until all the VSDs in the FS have received a data 
block. The next round of writes will then write a block to each VSD in 
exactly the same order. 
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This method yields the best write performance. There is, however, a 
penalty when a disk is added or removed from the FS. When a disk is 
added or removed from the FS, a re-striping occurs. The round Robin 
method takes the longest amount of time among the three algorithms to 
handle this re-striping. 

• Balanced Random 

This method is similar to round Robin. When data blocks are written, 
one block is written to each VSD. When all the VSDs have received 
one block of data, the round begins. However, in balanced Random, 
the order in the second round is not the same as the first round. 
Subsequent rounds are similarly written to all VSDs but in an order 
different than that of the previous round. 

• Random 

As its name implies, there is no set algorithm for handling writes. Each 
data block is written to a VSD according to a random function. If data 
replication is required, GPFS does ensure that both copies of the date 
are not written to the same disk. 

• Decide whether to use GPFS Quotas or not. 

GPFS quotas define the amount of space in the FS that a user or a group 
of users is allowed to use. There are three parameters which quotas 
operate with: Soft limit, hard limit, and grace period. 

The hard limit is the maximum disk space and files that a user or group 
can accumulate. A soft limit are the levels below which a user or group 
can safely operate. A grace period is only used for soft limits and define a 
period of time in which a user or group can exceed the soft limit. 

The usage and limits data are stored in the files quota.user and 
quota.group files that reside in the root directories of GPFS FSs. 

In a quota-enabled configuration, one node is automatically nominated as 
the quota manager whenever GPFS is started. The quota manager 
allocates disk blocks to the other nodes writing to the FS and compares 
the allocated space to the quota limits at regular intervals. In order to 
reduce the need for frequent space requests from nodes writing to the FS, 
the quota manager allocates more disk blocks than requested. 

Quotas can be turned on by switching the Activate Quotas entry as shown 
in Figure 135 on page 351 to Yes or by specifying the -q yes flag for the 
mmcrfs command. 

Quotas are further discussed in General Parallel File System for AIX: 
Installation and Administration Guide, SA22-7278. 
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• Decide whether to replicate the files or not. 

At the FS level, GPFS provides an option to have additional copies of data 
and metadata be stored on the VSDs. This is above and beyond disk 
mirroring. Therefore, with both replication and mirroring turned on, it is 
possible to have a minimum of four copies of data being written. 

It is possible to replicate metadata, data, or both. The parameters for this 
are Max Meta Data Replicas and Max Data Replicas, which control the 
maximum factors of replication of metadata and data (respectively), and 
Default Meta Data Replica and Default Data Replicas, the actual factors of 
replication. Acceptable values are 1 or 2. 1 is the default and means no 
replication (only one copy), and 2 means replication is turned on (two 
copies). The Default values must be less than or equal to the Max values. 
In other words, the Max values grant permission for replication, while the 
Default values turn the replication on or off. 

Replication can be set at FS creation time and cannot be set through SMIT 
panels. The only way to turn on replication is with the command mmcrfs 
and the flags -Mfor Max Metadata Replicas, -mfor Default Metadata 
Replicas, -Rfor Max Data Replicas, and -rfor Default Data Replicas. 
Using the same example in Figure 135 on page 351, we can create a FS 
with both metadata and data replication turned on: 

mmcrfs /gpfs/fsl fsl -F /var/mmfs/etc/fsldesc -A yes -B 256K -i 512 -I 
16K -M 2 -m 2 -n 3 -R 2 -r 2 -v yes 

More information on these flags and the mmcrfs command can be found in 
General Parallel File System for AIX: Installation and Administration 
Guide , SA22-7278. 

Once a GPFS FS has been set up, it can be mounted or unmounted on the 
GPFS nodes using the AIX mount and umount commands. Or, you can use the 
SMIT panel by running smit fs and then selecting Mount File System. Figure 
136 on page 355 shows the SMIT panel for mounting a FS. 
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Figure 136. SMIT panel for mounting a file system 


12.4.3 Managing GPFS 

Once a GPFS FS has been set up, there are a number of tasks that can be 
performed to manage it. Some of the tasks and the commands to execute 
them are included here for reference. Note that SMIT panels are available as 
well to execute the commands. The commands and the SMIT panels are 
further described in the manual General Parallel File System for AIX: 
Installation and Administration Guide, SA22-7278. 

Changing the GPFS configuration 

It is possible to change the configuration of GPFS for performance tuning 
purposes. The command to do so is mmchconfig, and it is capable of changing 
the following attributes: 

1. pagepool 

2. data Structure Dump 

3. mallocsize 

4. maxFiles To Cache 
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5. priority 

6. autoload 

Changes to pagepool may take effect immediately if the -i option is chosen; 
otherwise, the changes will take effect the next time GPFS is started. 
Changes to data Structure Dump, mallocsize, maxFiles To Cache, and 
priority require a re-start of GPFS. Changes to autoload require a re-boot of 
the nodes where these are affected. 

For example, to immediately change the size of pagepool to 60 MB, run: 

mmchconfig pagepool=60M -i 

It is also possible to add and delete nodes from a GPFS configuration. The 
commands to do so are mmaddnode and mmdeinode. Be careful when adding or 
subtracting nodes from a GPFS configuration. GPFS uses quorum to 
determine if a GPFS FS stays mounted or not. It is easy to break the quorum 
requirement when adding or deleting nodes. Adding or deleting nodes 
automatically configures them for GPFS usage. Newly added nodes are 
considered GPFS nodes in a down state and are not recognized until a 
restart of GPFS. By maintaining quorum, you ensure that you can schedule a 
good time to refresh GPFS on the nodes. 

For example, consider a GPFS configuration of four nodes. The quorum is 
three. With all four nodes running, we can add or delete one node, and the 
quorum requirement is still satisfied. We can add up to three nodes into the 
GPFS group as long as all four current nodes stay up. If we try to add four 
nodes, the GPFS group consists then of eight nodes with a quorum 
requirement of five. However, at that point, GPFS can only see four nodes up 
(configured) and exits on all the current nodes. 

Deleting a FS 

Before deleting a GPFS FS, it must be unmounted from all GPFS nodes. The 
command to do so is mmdeifs. 

For example, if we want to delete fsi, which we have created in Figure 135 
on page 351, we can run: 

umount fsi on all GPFS nodes, then: 

mmdeifs fsl 

Checking and repairing a FS 
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If an FS cannot be mounted, or if messages are received saying that a file 
cannot be read, it is possible to have GPFS check and repair any repairable 
damages to the FS. The FS has to be in the unmounted state for GPFS to 
check it. 

The command is mmfsck. This command checks for and repairs the following 
file inconsistencies: 

• Blocks marked allocated that do not belong to any file. The blocks are 
marked free. 

• Files for which an i-node is allocated, but no directory entry exists, 
mmfsck either creates a directory entry for the file in the /lost+found 
directory, or it destroys the file. 

• Directory entries pointing to an i-node that is not allocated, mmfsck 
removes the entries. 

• Ill-formed directory entries. They are removed. 

• Incorrect link counts on files and directories. They are updated with the 
accurate counts. 

• Cycles in the directory structure. Any detected cycles are broken. If the 
cycle is a disconnected one, the new top level directory is moved to the 
/lost+found directory. 

FS attributes 

FS attributes can be listed with the mmisfs command. If no flags are specified, 
all attributes are listed. For example, to list all the attributes of fsi, run: 

mmisfs fsl 

To change FS attributes, use the mmchfs command. There are eight attributes 
that can be changed: 

1. Automatic mount of FS at GPFS startup 

2. Maximum number of files 

3. Default Metadata Replication 

4. Quota Enforcement 

5. Default Data Replication 

6. Stripe Method 

7. Mount point 

8. Migrate FS 
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For example, to change the FS to permit data replication, run 

mmchfs -r 2 

Querying and changing file replication Attributes 

The command mmisattr shows the replication factors for one or more files. If it 
is necessary to change this, use the mmchattr command. 

For example, to list the replication factors for a file /gpfs/fsl/test.file, run: 

mmisattr /gpfs/fsl/test.file 

If the value turns out to be i for data replication, and you want to change this 
to 2 , run: 

mmchattr -r 2 /gpfs/fsl/test.file 

Re-striping a GPFS FS 

If disks have been added to a GPFS, you may want to re-stripe the FS data 
across all the disks for system performance. This is particularly useful if the 
FS is seldom updated, and the data has not had a chance to propagate out to 
the new disk(s). To do this, run mmrestripefs. 

There are three options with this command, and any one of the three must be 
chosen. The -bflag stands for rebalancing. This is used when you simply 
want to re-stripe the files across the disks in the FS. The -mflag stands for 
migration. This option moves all critical data from any suspended disk in the 
FS. Critical data is all data that would be lost if the currently suspended 
disk(s) are removed. The -r flag stands for replication. This migrates all data 
from a suspended disk and restores all replicated files in the FS according to 
their replication factor. 

For example, a disk has been added to fsi, and you are ready to re-stripe the 
data onto this new disk, run: 

mmrestripefs fsl -b 

Query FS space 

The AIX command df shows the amount of free space left in a FS. This can 
also be run on a GPFS FS. However, if information regarding how balanced 
the GPFS FS is, the command to use is mmdf. This command is run against a 
specific GPFS FS and shows the VSDs that make up this FS and the amount 
of free space within each VSD. 
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For example, to check on the GPFS FS fsi and the amount of free space 
within each VSD that houses it, run: 


mmdf fsl 

12.4.4 Migration and coexistence 

The improvements in GPFS vl.2 have made it necessary that all nodes in a 
GPFS domain be at the same level of GPFS code. That is, in a GPFS 
domain, you cannot run both GPFS vl.1 and vl.2. 

However, it is possible to run multiple levels of GPFS codes provided that 
each level is in its own group within one system partition. 

There are two possible scenarios to migrate to GPFS vl .2 from previous 
versions: Full and staged. As its name implies, a full migration means that all 
the GPFS nodes within a system are installed with GPFS vl.2. A staged 
migration means that certain nodes are selected to form a GPFS group with 
GPFS vl .2 installed. Once you are convinced that this test group is safe, you 
may migrate the rest of your system. 

Migration and coexistence are further described in both PSSP 3.1 
Announcement, SG24-5332, and General Parallel File System for AIX: 
Installation and Migration Guide, SA22-7278. 


12.5 Related documentation 

Some extra documentation will help you better understand the different 
concepts and examples covered in this chapter. We recommend you take a 
look at some of these books in order to maximized your chances of success 
in the SP Certification exam. 

SP Manuals 

For the IBM Virtual Shared Disk (VSD) and the IBM Recoverable Virtual 
Shared Disk (RVSD), the manual PSSP: Managing Shared Disks, 
SA22-7349, is an excellent guide on installing and configuring the virtual disk 
technology, especially Chapters 1 to 6. 

For the General Parallel File system for AIX (GPFS), this manual will help 
you: General Parallel File System for AIX: Installation and Administration 
Guide, SA22-7278. 

SP Redbooks 

Redbooks are always good references. There are a couple of redbooks that 
you may want to take a look at. Inside the RS/6000 SP, SG24-5145 gives you 
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a broad coverage of the different components in the RS/6000 SP. For 
VSD/RVSD and GPFS, we recommend you read Chapter 4, especially 4.7 
"Parallel I/O". Another redbook that covers in much more detail this 
technology is GPFS: A Parallel File System, SG24-5165. This book will 
provide you with practical information about installing, configuring, and 
managing R/VSD and GPFS. We recommend you read Chapters 1 and 2. 


12.6 Sample questions 

This section provides a series of questions to help aid you in preparation for 
the certification exam. The answers to these questions can be found in 
Appendix A. 

1. Assuming the Working Collective is set to all nodes in the VSD Cluster, 
which command would most satisfactorily determine whether the VSDs 
are up and running on all the VSD nodes? 

A. dsh statvsd -a 

B. dsh lsvsd -1 | pg 

C. dsh vsdatalst -a | pg 

D. SDRGetObjects VSD_Table CState==active 

2. You are in charge of installing, configuring, and starting a simple VSD 
configuration. Which of the following better describes the steps you will 
execute in order to get this done? 

A. 1) Create volume groups. 

2) Create logical volumes. 

3) Create virtual shared disks. 

4) Activate virtual shared disks. 

B. 1) Install the VSD and RSVD software. 

2) Designate the VSD nodes. 

3) Create virtual shared disks. 

4) Configure virtual shared disks. 

5) Start virtual shared disks. 

C. 1) Install the VSD and RSVD software. 

2) Designate the VSD nodes. 

3) Create virtual shared disks. 

4) Configure virtual shared disks. 

5) Prepare the virtual shared disks. 

6) Start virtual shared disks. 

D. 1) Install the VSD and RSVD software. 

2) Set authorization. 
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3) Designate the VSD nodes. 

4) Create virtual shared disks. 

5) Configure virtual shared disks. 

6) Start virtual shared disks. 

3. What is the definition of a GPFS node? 

A. It is the server node that provides token management. 

B. It is the node that has GPFS up and running. 

C. It is the node that provides the data disks for the file system. 

D. It is the server node that has GPFS and VSD up and running. 

4. How do you start GPFS? 

A. startsrc -s mmfs 

B. startsrc -s gpfs 

C. dsh startsrc -s mmfs 

D. dsh startsrc -s gpfs 

5. Which of the following attributes is mmchconf ig capable of changing? 

A. MaxFiles to Cache 

B. Quota Enforcement 

C. Default Data Replication 

D. Mount Point 

6. Which of the following is NOT a striping algorithm implemented by GPFS? 

A. Round Robin 

B. Balanced Random 

C. Random 

D. Balanced Round Robin 

7. Even though your installation does not have twin-tailed disks or SSA loops 
for multi-host disk connection, what does GPFS require? 

A. VSD 

B. RVSD 

C. NIS 

D. DNS 


12.7 Exercises 

Here are some exercises you may wish to perform: 
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1. What are the necessary steps to create a VSD? 

2. What specific functions does the multi-threaded daemon specifically 
provides within GPFS? 

3. What decisions have to be made before creating a GPFS FS? 

4. Familiarize yourself with the mmchconf ig command. Which attribute is 
capable of changing? 

5. Explain the capabilities of the mmf sck command. 

6. Explore which FS attribute the mmchf s command is capable of changing. 
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Chapter 13. Problem management tools 


This chapter provides an overview and several examples for problem 
management by using the tools available within the RS/6000 SP. By problem 
management, we understand problem notification, log consolidation, and 
automatic recovery. 

This chapter covers this by first giving an explanation about the technology 
used by all the problem management tools available on the RS/6000 SP. It 
then describes two ways of using these tools and setting up monitors for 
critical components, such as memory, file system space, and daemons. This 
first method is using the command line interface through the Problem 
Management subsystem (PMAN), and the second method is using the user 
graphical interface (SP Event Perspective). 


13.1 Key concepts you should study 

Before taking the exam, make sure you understand the following concepts: 

• What is a resource monitor? 

• What is and where the configuration data for the Event Management 
subsystem is stored? 

• How to manage the Event Management daemons. 

• How to get authorization to use the Problem Management subsystem. 

• How to use the pmandef command. 

• How to define conditions and events through SP Event Perspectives. 


13.2 AIX service aids 

Basically, every node (and the control workstation) is an AIX machine. This 
means that all the problem determination tools available for standard 
RS/6000 machines are also available for SP nodes and control workstations. 

AIX provides facilities and tools for error logging, system tracing, and system 
dumping (creation and analysis). Most of these facilities are included in the 
bos.rte fileset within AIX and, therefore, are installed on every node and 
control workstation automatically. However, some additional facilities, 
especially tools, are included in an optionally installable package called 
bos.sysmgt.serv_aid that should be installed in your nodes and control 
workstation. 
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13.2.1 Error logging facility 

The AIX error logging facility records hardware and software failures or 
informational messages in the error log. All of the AIX and PSSP subsystems 
will use this facility to log error messages or information about changes to 
state information. 

By analyzing this log, you can get an idea of what went wrong, when, and 
possibly why. However, due to the way information is presented by the errpt 
command, it makes it difficult to correlate errors within a single machine. This 
is much worse in the SP where errors could be caused by components on 
different machines. We will get back to this point later in this chapter. 

The errdemon daemon keeps the log file updated based on information and 
errors logged by subsystems through the errlog facility or through the errsave 
facility if they are running at kernel level. In any case, the errdemon daemon 
adds the entries in the error log in a first-come-first-serve basis. 

This error log facility also provides a mechanism through which you could 
create a notification object for specific log entries. You could instruct the 
errdemon daemon to send you an e-mail every time there is a hardware error. 
The PSSP Diagnosis Guide, GA22-7350, Section "Using the AIX Error Log 
Notification Facility" on page 72, provides excellent examples on setting up 
notification methods. 

Log analysis is not bad. However, log monitoring is much better. You do not 
really want to go and check the error log on every node within your 128 nodes 
installation. Probably what you do is to create some notification objects in 
your nodes to instruct the errdemon daemon on those nodes to notify you in 
case of any critical error getting logged into the error log. 

PSSP provides facilities for log monitoring and error notification. This differs 
from AIX notification in the sense that although it uses the AIX notification 
methods, it provides a global view of your system; so, you could, for example, 
create a monitor for your AIX error log on all your nodes at once with a single 
command or a few clicks. 

13.2.2 Trace facility 

Trace facility is available through AIX. However, it comes in an optional fileset 
called bos.sysmgt.trace. Although the base system (bos.rte) includes minimal 
services for tracing, you need to install this optional component if you want to 
activate the trace daemon and generate trace reports. 
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If you get to the point where a trace is needed, it is probably because all the 
conventional methods have failed. Tracing is a serious business; it involves 
commitment and dedication to understand the trace report. 

Tracing basically works in a two-step mode. You turn on the trace on selected 
subsystems and/or calls, and then you analyze the trace file through the 
report tools. 

The events that can be included or excluded from the tracing facility are listed 
in the /usr/include/sys/trchkid.h header file. They are called hooks and 
sub-hooks. With these hooks, you can tell the tracing facility which specific 
event you want to trace. For example, you could generate a trace for all the 
CREAT calls that include file creations. 

To learn more about tracing, refer to Chapter 11 "Trace Facility" of the AIX 
V4.3 Problem Solving Guide and Reference, SC23-4123. 

13.2.3 System dump facility 

AIX generates a system dump when a severe error occurs. A system dump 
can also be user-initiated by users with root authority. A system dump creates 
a picture of your system’s memory contents. 

In AIX v3, the default location for the system dump is the paging space (hd6). 
It means that when the system is started up again, the dump needs to be 
moved to a different location. By default, the final location of a system dump 
is the /var/adm/ras directory, which implies that the /var file system should 
have enough free space to hold this dump. The size of the dump depends on 
your system memory and load. It can be obtained (without causing a system 
dump) by using the sysdumpdev -e command. 

If there is not enough space in /var/adm/ras for copying the dump, the system 
will ask you what to do with this dump (throw it away, copy it to tape, and so 
on). This is changed for SP nodes since they usually do not have people 
staring at the console because there is no console (at least not a physical 
console). Similar to machines running AIX v3, the primary dump device is not 
hd6 but hd7 (a special dump device); so when the machine boots up, there is 
no need for moving the dump since the device is not being use for anything 
else. Although your nodes are running AIX v4, so the primary dump device 
should be hd6 (paging space), the /etc/rc.sp script will change it back to 
/dev/hd7 on every boot. 

A system dump certainly can help a lot in determining who took the machine 
out of order. A good system dump in the right hands can point to the guilty 
component. Keep in mind that a system dump is a copy of selected areas of 
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the kernel. These areas contain information about the processes and routines 
running at the moment of the crash. However, for the operating system, it is 
easier to keep this information in memory address format. So, for a good 
system dump analysis you will need the table of symbols that can be obtained 
from the operating system executable (/unix). Therefore, always save your 
system dumps along with the /unix corresponding to the operating system 
executable where the dump was produced. Support people will thank you. 

For more information on AIX system dump, refer to Chapter 12 "System 
Dump Facility" on page 81 of the AIX V4.3 Problem Solving Guide and 
Reference, SC23-4123. 


13.3 PSSP service aids 

PSSP provides several tools for problem determination. Therefore, in this 
sometimes complex environment, you are not alone. The facilities that PSSP 
provides range from log files being present on every node and the control 
workstation to SP Perspectives that utilize the RS/6000 Cluster Technology. 

13.3.1 SP log files 

Besides errors and information being logged into the AIX error log, most of 
the PSSP subsystems write to their own log files where, usually, the 
information you need for problem isolation and problem determination 
resides. 

Since some components run only on the control workstation (such as the 
SDR daemon, the host respond daemon, the switch admin daemon, and so 
on), others run only on nodes (such as the switch daemon). This needs to be 
taken into consideration in the search for logs. The PSSP Diagnosis Guide, 
GA22-7350, contains a complete list of PSSP log files and their location. 

Unfortunately, there is not a common rule for analyzing log files. They are 
very specific to each component, and, in most of the cases, they are created 
as internal debugging mechanisms and not for public consumption. 

In this redbook, we cover some of these log files and explain how to read 
them. However, this information may be obsolete for the next release of 
PSSP. The only official logging information is the AIX error log. However, 
nothing is stopping you from reading these log files. As a matter of fact, these 
SP log files sometimes are essential for problem determination. 

All the PSSP log files are located in the /var/adm/SPIogs directory. All the 
RSCT log files are located in the /var/ha/log directory. So, considering that 
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these two locations reside on the /var file system, make sure you have 
enough free space for holding all the logged information. Refer to RS/6000: 
Planning Volume 2, GA22-7281, for details on disk space requirement. 


13.4 Event Management 

Event Management (EM) provides an application comprehensive monitoring 
of hardware and software resources in the system. A resource is simply an 
entity in the system that provides a set of services. CPUs execute 
instructions; disks store data, and database subsystems enable applications. 
You define what system events are of interest to your application, register 
them with EM, and let EM efficiently monitor the system. Should the event 
occur, EM will notify your application. Figure 137 illustrates EM’s functional 
design. 


EMAPI 



Event Management 



► Observes Resource Variables 

► Notifies Clients about events 



RMAPI 

Resource 
Monitors 


Figure 137. EM Design 

EM gathers information on system resources using Resource Monitors 
(RMs). RMs provide the actual data on system resources to the 
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event-determining algorithms of EM. Resource Monitors (RMs) are integral to 
EM, but how do RMs get their data? Data-gathering mechanisms would vary 
according to platform (for example, sampling CPU data in an AIX 
environment is implemented completely different than in a Windows NT 
environment). The SP-specific implementation of resource data-gathering 
mechanisms is described later. 

EM is a distributed application, implemented by the EM daemon (haemd), 
running on each node and the CWS. Similar to Topology Services (TS) and 
Group Services (GS), EM is partition-sensitive; thus, the CWS may run 
multiple instances of haemd. To manage its distributed daemons, EM exploits 
GS. GS serves applications, such as EM. As EM must communicate reliably 
among its daemons, it uses the Reliable Messaging information built from TS. 
This is shown in Figure 138. 



Figure 138. EM client and peer communication 

EM receives resource data across the Resource Monitor Application 
Programming Interface (RMAPI). Clients communicate with EM through the 
Event Manager Application Programming Interface (EMAPI). An EM client 
can comprise many processes spread across nodes in a partition. A local 
process, that is, one executing on the same node as a given EM daemon, 
uses reliable UNIX domain sockets to talk to EM. On the CWS, a local 
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process connects to the EM daemon that is running in the same system 
partition as the overall client. In this manner, the client can get events from 
anywhere in its partition. 

To remote clients, that is, clients executing in a separate partition or outside 
the SP entirely, use TCP/IP sockets, which is a less reliable method because 
of the protocol that cannot always properly deal with crashed communication 
sessions between programs. Remote clients usually connect only to the EM 
daemon on the CWS. When connecting, a remote client specifies the name of 
the target partition on the call to the EMAPI. The remote client will then 
connect to the EM daemon on the CWS that is running in the target partition. 
A client could connect directly to any EM daemon in the target partition and 
get the same events, but you would need an algorithm to determine the target 
node. It is easier to just connect to the appropriate daemon on the CWS. 

13.4.1 Resource monitors 

Resource monitors are programs that observe the state of specific system 
resources and transform this state into several resource variables. The 
resource monitors periodically pass these variables to the Event Manager 
daemon. The Event Manager daemon then applies expressions, which have 
been specified by EM clients, to each resource variable. If the expression is 
true, an event is generated and sent to the appropriate EM client. EM clients 
may also query the Event Manager daemon for the current values of the 
resource variables. 

13.4.2 Configuration files 

Resource variables, resource monitors, and other related information are 
specified in several System Data Repository (SDR) object classes. 
Information stored in these SDR classes is then translated into a binary form 
that can be easily used by the Event Management subsystem. 

This EM database, call Event Management Configuration Database 
(EMCDB), is produced by the haemcfg command from the information in the 
SDR. The format of the EMCDB is designed to permit quick loading of the 
database by the Event Manager daemon and the Resource Monitor API 
(RMAPI). It also contains configuration data in an optimized format to 
minimize the amount of data that must be sent between the Event Manager 
daemons and between an Event Manager daemon and its resource monitors. 

When the SDR data is compiled, the EMCDB is placed in a staging file. When 
the Event Manager daemon on a node or the control workstation initializes, it 
automatically copies the EMCDB from the staging file to a run-time file on the 
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node or the control workstation. The run-time file is called 
/etc/ha/cfg/em.domain_name.cdb when domain_name is the system partition 
name. 

Each time you execute the haemcfg command, or re-create the Event 
Management subsystem through the syspar_ctri command, a new EMCDB 
file is created with a new version number. The new version number is stored 
in the Syspar SDR class as shown in Figure 139. 


[sp3en0:/]# SDRGetObjects Syspar haem_cdb_version 
haem_cdb_version 
913591595,334861568,0 


Figure 139. EMCDB version stored in the syspar class 

To check the version number of the run-time version, you can use the 
following command: 


lssrc -Is haem.domain_name from the CWS 
or 

lssrc -is haem from a node 

Because the Event Management subsystem is a distributed subsystem, all 
the Event Manager daemons have to use the same configuration information 
provided by the EMCDB. Using the same EMCDB version is vital. 

The way in which Event Manager daemons determine the EMCDB version 
has important implications for the configuration of the system. To place a new 
version of the EMCDB into production (that is, to make it the run-time 
version), you must stop each Event Manager daemon in the domain after the 
haemcfg command is run. Stopping the daemons dissolves the existing peer 
group. Once the existing peer group is dissolved, the daemon can be 
restarted. To check if the peer group has been dissolved, use the following 
command: 

For PSSP 2.2/2.3/2.4: 

/usr/lpp/ssp/bin/hagsgr -s hags.domain_name | grep ha_em_j>eers 

For PSSP 3.1: 

/usr/sbin/rsct/bin/hagsgr -s hags.domain_name | grep ha_em_j>eers 
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domain_name is added only if the command runs on the control workstation. 
The output from these commands should be null. 

Once the peer group is dissolved, the daemons can be restarted. As they 
restart, the daemons form a new peer group. 


13.5 Problem management 

The Problem Management subsystem (PMAN) is a facility, present on 
systems running PSSP V2.2 or later used for problem determination, problem 
notification, and problem solving. It uses the RSCT infrastructure for 
monitoring conditions on behalf of authorized users and then generates 
actions accordingly. 

The PMAN subsystem consists of three components: 

• pmand - This daemon interfaces directly with the Event Manager daemon 
to register conditions and to receive notifications. This daemon runs on 
every node and the control workstation, and it is partition-sensitive (the 
control workstation may have more than one daemon running in case of 
multiple partitions). 

• pmanrmd - This is a resource monitor provided by PMAN to feed Event 
Management with additional 16 user-defined variables. You can program 
this resource monitor to periodically run a command or execute a script to 
update one of these variables. Refer to “Monitoring a log file” on page 372, 
for an example of how to use this facility. 

• sp_configd - Through this daemon, PMAN can send Simple Network 
Management Protocol (SNMP) traps to SNMP managers to report 
pre-defined conditions. 

13.5.1 Authorization 

In order to use the Problem Management subsystem, users need to obtain a 
Kerberos principal, and this principal needs to be listed in the access control 
list (ACL) file for the PMAN subsystem. This ACL file is managed by the sysctl 
subsystem and is located at /etc/sysctl.pman.acl. The content of this file is as 
follows: 

#acl# 

# These are the kerberos principals for the users that can configure 

# Problem Management on this node. They must be of the form as indicated 

# in the commented out records below. The pound sign (#) is the comment 

# character, and the underscore (_) is part of the "_PRINCIPAL" keyword, 

# so do not delete the underscore. 
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#_PRINCIPAL root.admin@PPD.POK.IBM.COM 
#_PRINCIPAL joeuser@PPD.POK.IBM.CCM 

_PRINCIPAL root.admin@MSC.ITSO.IBM.COM 

In this case, the principal authorized to use the Problem Management 
subsystem is root.admin in the MSC.ITSO.IBM.COM realm. 

Each time you make a change to this file, the sysctl subsystem must be 
refreshed. To refresh the sysctl subsystem, use the following command: 

refresh -s sysctld 

The pmandef command has a very particular syntax; so, if you want to give it a 
try, take a look at the PSSP: Command and Technical Reference, SA22-7351, 
on page 350 for a complete definition of this command. Chapter 25 "Using the 
Problem Management Subsystem" in PSSP: Administration Guide, 
SA22-7348, contains several examples and a complete explanation about 
how to use this facility. 

Finally, the /usr/ipp/ssp/instaii/bin/pmandefauits script is an excellent 
starting point for using the PMAN subsystem. It has several examples about 
monitors for daemons, log files, file systems, and so forth. 

Monitoring a log file 

Now we know that the PMAN subsystem provides 16 resource variables for 
user-defined events. In this section, we will use one of these variables to 
monitor an specific condition that is not monitored by default for the PSSP 
components. 

Let us assume that you want to get a notification on the console’s screen 
each time there is an authentication failure for remote execution. We know 
that the remote shell daemon (rshd) logs these errors to the 
/var/adm/SPIogs/SPdaemon.log; so, we can create a monitor for this specific 
error. 

First, we need to identify the error that gets logged into this file every time 
somebody tries to execute a remote shell command without the 
corresponding credentials. Let us try and watch the error log file: 

Feb 27 14:30:16 sp3n01 rshd[17144]: Failed krb5_compat_recvauth 
Feb 27 14:30:16 sp3n01 rshd[17144]: Authentication failed from 
sp3en0.msc.itso.ibm.com: A connection is ended by software. 

From this content we see that Authentication failed seems to be a good 
string to look for. So, the idea here is to notify the operator (console) that 
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there was a failed attempt to access this machine through the remote shell 
daemon. 


Now, there is a small problem to solve. If we are going to check this log file 
every few minutes, how do we know if the log entry is new, or if it was already 
reported? Fortunately, the way user-defined resource variables work is based 
on strings. The standard output of the script you associate with a 
user-defined resource variable is stored as the value of that variable. This 
means that if we print out that the last Authentication failed entry every time, 
the variable value will change only when there is a new entry in the log file. 

Let’s create the definition for a user-defined variable. To do this, PMAN needs 
a configuration file that has to be loaded to the SDR by using the 

pmanrml oadSDR Command. 

PSSP provides a template for this configuration file. It is located in the 
/spdata/sysl/pman directory on the control workstation. Let us make a copy 
of this file and edit it: 

TargetType=NODE_RANGE 

Target=0-5 

Rvar=IBM.PSSP.pm.User_statel 
Samplnt=60 

Command=/usr/local/bin/Guard.pi 

In this file, you can define all sixteen user-defined variables (there must be 
one stanza per variable). In this case, we have defined the 
IBM.PSSP.pm.User_state1 resource variable. The resource monitor 
(pmanrmd) will update this variable every 60 seconds as specified in the 
sample interval (Samplnt). The value of the variable will correspond to the 
standard output of the /usr/iocai/bin/Guard.pi script. Let us see what the 
script does: 

#!/usr/lpp/ssp/perl5/bin/perl 

my $logfile="/var/adm/SPlogs/SPdaemon.log"; 
my $lastentry; 


open (LOG,"cat $logfile|") | 

die "Ops! Can't open $logfile: $!\n"; 

while (<LOG>) { 

if(/Authentication failed/) { 

$lastentry = $_; 

} 
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print "$lastentry"; 


The script printed out the Authentication failed entry from the log file. If there 
is no new entry, the old value will be the same as the new value; so, all we 
have to do is to create a monitor for this variable that gets notified every time 
the value of this variable changes. Let us take a look at the monitor’s 
definition: 


[spSenO:/]# /usr/lpp/ssp/bin/pmandef -s authfailed \ 

-e 'IBM.PSSP.pm.User_statel:NodeNum=0-5:X@0!=X@P0' \ 

-c "/usr/local/bin/SaySomething.pl" \ 

-n 0 

This command defines a monitor, through PMAN, for the 
IBM.PSSP.pm.User_state1 resource variable. The expression x@0!=x@po 
means that if the previous value (x@po) is different from the current value (x@o), 
then the variable has changed. The special syntax for this variable is because 
these user-defined variables are structured byte strings (SBS); so to access 
the value of this variable, you have to index this structure. However, these 
user-defined variables have only one field; so, only the index o is valid. 

You can get a complete definition of this resource variable (and others) by 
executing the following command: 

[spSenO:/]# haemqvar "" IBM.PSSP.pm.User_statel "*"|more 

This command gives you a very good explanation along with examples on 
how to use it. 

Now that we have subscribed our monitor, let us see what the 

/usr/local/bin/SaySomething.pl script does: 

#!/usr/lpp/ssp/perl5/bin/perl 

$cwsdisplay = "sp5en0:0"; 

$term="/usr/dt/bin/aixterm"; 

$cmd = "/usr/local/bin/SayltLoud.pl"; 

$title = qq/\"Warning on node $ENV{'PMAN_LOCATION'}\"/; 

$msg = $ENV{ ' PMAN_RVFIELDO ' } ; 

$bg = "red"; 

$fg = "white"; 

$geo = "60x5+200+100"; 
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$execute = qq/$term -display $cwsdisplay -T $title -geometry $geo -bg $bg 
-fg $fg -e $cmd $msg/; 

system($execute); 

This script will open a warning window with a red background notifying the 
operator (it is run on node 0, the control workstation) about the intruder. 


The script, /usr/iocai/bin/sayitLoud.pi, will display the error log entry (the 
resource variable value) inside the warning window. Let’s take a look at this 
script: 

#!/usr/lpp/ssp/perl5/bin/perl 
print "@ffiRGV\n"; 

print "-Press Enter-\n" ; 

<STDIN> 

Now that the monitor is active, let us try to access one of the nodes. We 
destroy our credentials (the kdestroy command), and then we try to execute a 
command on one of the nodes: 

[spSenO:/]# kdestroy 

[sp5en0:/]# dsh -w sp5n01 date 

sp5n01: spk4rsh: 0041-003 No tickets file found. You need to run "k4init". 
sp5n01: rshd: 0826-813 Permission is denied, 
dsh: 5025-509 sp5n01 rsh had exit code 1 

After a few second (a minute at most), we receive the warning window shown 
in following warning message (Figure 140) at the control workstation: 



Figure 140. User-defined resource variables - Warning window example 

The example shown here is very simple. It is not intended to be complete, but 
to show how to use these user-defined resource variables. 

Information sent by the Problem Management subsystem in an notification 
can be logged into different repositories for further analysis. The notify_event 
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script captures event information and mails it to the user running the 
command on the local node. 


The iog_event script captures event information and logs it to a wraparound 
file. The syntax for the iog_event script is: 

/usr/lpp/ssp/bin/log_event <log_filename> 

The iog_event script uses the AIX aiog command to write to a wraparound file. 
The size of the wraparound file is limited to 64 K. The aiog command must be 
used to read the file. Refer to the AIX aiog man page for more information on 
this command. 


13.6 Event perspectives 

The SP Perspectives are a set of applications, each of which has a graphical 
interface (GUI), that enable you to perform monitoring and system 
management tasks for your SP system by directly manipulating icons that 
represent system objects. 

Event Perspective is one of these applications. It provides a graphical 
interface to Event Management and the Problem Management subsystems. 

Through this interface, you can create monitors for triggering events based on 
defined conditions and generate actions by using the Problem Management 
subsystem when any of these events is triggered. 

13.6.1 Defining conditions 

The procedure for creating monitors is very straightforward. A condition 
needs to be defined prior to the creation of the monitor. 

Conditions are based on resource variables, resource identifiers, and 
expressions, which, at the end, is what the Event Manager daemon 
evaluates. 

To better illustrate this point, let’s define a condition for a file system full. This 
condition will later be used in a monitor. The following steps are required for 
creating a condition: 

Step 1 Decide what you want to monitor. In this step, you need to narrow 
down the condition you want to monitor. For example: We want to 
monitor free space in the /tmp file system. Then, we have to decide on 
the particular resource we want to monitor and the condition. We 
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should also think of where in the SP system we want to monitor free 
space in /tmp. Let us decide on that later. 

Step 2 Identify the resource variable. Once you have decided the condition 
you want to monitor, you need to find the variable that represents the 
particular resource associated to the condition. In our case, free 
space in a file system. 

PSSP provides some facilities to find out the right variable. In releases 
previous to PSSP 3.1, the only way to get some information on 
resource variables is through the help facility on SP Perspectives. 
However, in PSSP 3.1 there is a new command that will help you find 
the right variable, and it will provide you with information on how to 
use it. Let’s use this new command, which is called haemqvar. 

We can use this command to list all the variables related to file 
systems as follows: 

[sp3en0:/]# haemqvar -d IBM.PSSP.aixos.FS "" "*" 

IBM.PSSP.aixos.VG.free Free space in volume group, MB. 

IBM.PSSP.aixos.FS.%totused Used space in percent. 

IBM.PSSP.aixos.FS.%nodesused Percent of file nodes that are used. 

In this case, we have listed the variables within the 
IBM.PSSP.aixos.FS class. You may use the same format to list other 
classes. 

In particular, we are interested in the ibm.pssp. aixos.FS.%totused 
variable that represents exactly what we want to monitor. 

Step 3 Define the expression. In order to define the expression we will use in 
our condition, we need to know how we use this variable. In other 
words, what are the resource identifiers for this variable. So, let us use 
the haemqvar command again; but this time, let us query the specific 
variable and get a full description as shown in Figure 141 on page 
378. 
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[sp3en0:/]# haemqvar "IBM.PSSP.aixos.FS" IBM.PSSP.aixos.FS.%totused 


Variable Name: 

Value Type: 

Data Type: 

Initial Value: 

Class: 

Locator: 

Variable Description: 

Used space in percent 


IBM.PSSP.aixos.FS.%totused 

Quantity 

float 

0.000000 

IBM.PSSP.aixos.FS 
NodeNum 


IBM.PSSP.aixos.FS.%totused represents the percent of space in a file 
system that is in use. The resource variable's resource ID specifies 
the names of the ldescriptogical volume (LV) and volume group (VG) of the file 
system, and the number of the node (NodeNum) on which the file system 
resides. 

...lines not displayed... 

The lsvg conmand can be used to list, and display information about 
the volume groups defined on a node. For exanple: 

# lsvg | lsvg -i -1 
spdata: 


LV NAME 

TYPE 

LPs 

PPs 

PVs 

LV STATE 

MOUNT POINT 

spdatalv 

jfs 

450 

450 

1 

open/syncd 

/ spdata 

loglvOO 

jfslog 

1 

1 

1 

open/syncd 

N/A 

rootvg: 

LV NAME 

TYPE 

LPs 

PPs 

PVs 

LV STATE 

MOUNT POINT 

hd6 

paging 

64 

64 

1 

open/syncd 

N/A 

hd5 

boot 

1 

1 

1 

closed/syncd 

N/A 

hd8 

jfslog 

1 

1 

1 

open/syncd 

N/A 

hd4 

jfs 

18 

18 

1 

open/syncd 

/ 

hd2 

jfs 

148 

148 

1 

open/syncd 

/usr 

hd9var 

jfs 

13 

13 

1 

open/syncd 

/var 

hd3 

jfs 

32 

32 

1 

open/syncd 

/tmp 

hdl 

jfs 

1 

1 

1 

open/syncd 

/home 


lines not displayed... 

When enough files have been created to use all the available 
i-nodes, no more files can be created, even if the file system 
has free space. The "%nodesused" resource variable can be used 
to monitor the percent of file nodes which are in use. 


Example expression: 

To receive a notification that the file system mounted on /trip on any 
node is more than 90% full, and also receive a notfication when the 
percentage has subsequently dropped below 80%, one could register 
for the following event using the HA_EM_CMD_REG2 cotrmand: 


Resource variable: 
Resource ID: 
Expression: 

Re-arm expression: 

e....lines not displayed.... 


IBM.PSSP.aixos.FS.%totused 
VG-rootvg; LV-hd3; NodeNum= * 
X > 90 
X < 80 


Figure 141. Resource variable query (partial view) 

This command gives us a complete description of the variable and 
also tells us how to use it in an expression. Therefore, our expression 
would be: x>90 
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We could use a rearm expression in our condition. A rearm expression 
is optional, and it defines a second condition that Event Manager will 
switch to when the main expression triggers. In our example, a rearm 
expression would be x< 60 . This means that after the file system is 
more than 90 percent used, Event Manager will send us a notification, 
and then it will continue monitoring the file system; but now it will send 
us a notification when the space used falls below 60 percent. 

Step 4 Create the condition. To create the condition, let us move the focus to 
the conditions panel on Event Perspective and then select 
Actions->Create... as shown in Figure 142. 



Figure 142. Create condition option from Event Perspectives 

Once you click on the Actions->Create... option, you will be 
presented with the Create Condition panel as shown in Figure 143 on 
page 380. 
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Create Condition - sp3en0 



Name: 

Description: 

Select a resource variable: 
Resource variable classes 


Resource variable names 


; 11 

_ 


IBM.PSSP.aixos.Proc.sw pque 

3 

IBM.PSSP.CSS 



IBM.PSSP.aixos.Proc.runque 


IBM.PSSP.CSSlog 



IBM.PSSP.aixos.pagsp.size 


IBM.PSSP.HRRMLD 



IBM.PSSP.aixos.pagsp.Xfree 


IBM.PSSP.LL 



IBM.PSSP.aixos.PagSp.totaIsize 


IBM.PSSP.Mem bership 



IBM.PSSP.aixos.PagSp.totaIfree 


IBM.PSSP.PRCRS 



IBM.PSSP.aixos.PagSp. %t otalused 


IBM.PSSP.Prog 



IBM.PSSP.aixos.PagSp.Xtotalfree 


IBM .PSSP.Response 



IBM.PSSP.aixos.Mem.Virt.pgspgou- 


EJ_IJ 



Sti.‘■'--•nr 



Show Details..* 


Event expression: 


Rearm expression 
(optional): 


Resource ID elements fixed for the condition (optional): 




Create 


Apply 


Cancel 


Reset 


Help 


Figure 143. Create Condition panel 

As you can see in the Create Condition panel, there are two initial 
input boxes for the name (Name) of the condition and the description 
(Description). For our example, let’s name the condition 
Fiie_system_Getting_Fuii and give a brief description, such as The 
file system you are monitoring is getting full. Better do 

something!. This is shown in Figure 144 on page 381. 
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The file system you are monitoring is getting 
full. Better do something! 
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J 

.21 

Select a resource variable: 



Resource variable classes 

Resource variable names 

11 1 --— r. ,Prnr, - 




Figure 144. Defining name and description of a condition 


Now we select the resource variable class (iBM.pssp.aixos.Fs) and the 
resource variable (iBM.pssp.aixos.FS.%totused) followed by the 
expression and then rearm the expression we defined in the previous 
step. This is shown in Figure 145. 


Select a resource variable: 
Resource variable classes 


IBM.PSSP.Response 
IBM.PSSP.SDR 

IBM.PSSP.SP HW 

IBM.PSSP.VSD 

IBM.PSSP.aixos.CPU 

IBM.PSSP.aixos.Disk 




IBM.PSSP.aixos.LfiN 

IBM.PSSP.aixos.Mem 


T- 

q 


Event expression: 


Rearm expression 
(optional): 


X<60 


Resource variable names 







Show Details*.. 


Figure 145. Selecting resource variable and defining expression 


If you click on Show Details..., it will present you the same output we 
got through the haemqvar command. We will leave the last input box 
empty, which represents the resources ID that you want to fix. For 
example, this resource variable (ibm.pssp. aixos.FS.%totused) has two 
resource IDs. One is the volume group name (VG), and the other is 
the logical volume name (LV). By using the last input box, we could 
have fixed one or the two resource IDs to a specific file system; so, 
this condition could be applied to that particular file system only. 
Flowever, leaving this input blank enables us to use this condition in 
any monitor. 
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Once the condition has been created, an icon will appear in the 
Conditions panel as shown in Figure 146. 
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Conditions 




K 

m 

ta 

ft 

sn itchPouerLED 

switchResponds 

tmpFull 


a 




varFull 

File_System_Getting_Full 




Display all the Conditions in the system partition 


Figure 146. Conditions panel - New condition 


13.7 Related documentation 

This documentation will help you in getting more detailed information on the 
different topics covered in the chapter. Also, remember that good hands-on 
experience may reduce the amount of preparation for the SP Certification 
exam. 

SP manuals 

The only SP manuals that can help you with this is the PSSP: Administration 
Guide, SA22-7348, for PSSP 3.1 , and the PSSP: Administration Guide, 
GC23-3897, for PSSP 2.4. In both books, there is a section dedicated to 
availability and problem management as well as SP Perspectives. We 
recommend you to read at least Chapters 24 and 25 of the PSSP 3.1 guide 
and Chapters 23 and 24 of the PSSP 2.4 guide. 

SP redbooks 

There are several books that cover the topics in this chapter. However, we 
recommend three of them. Chapters 2 and 3 of RS/6000 SP Monitoring: 
Keeping it Alive, SG24-4873, will give you a good understanding about the 
concepts involved. The other redbook is Inside the RS/6000 SP, SG24-5145. 
This redbook contains an excellent description of the Event Management and 
Problem Management subsystems. Finally, the redbook RS/6000 SP PSSP 
2.2 Technical Presentation, SG24-4868, contains detailed information on 
these topics. 

For a PSSP 3.1 update, we recommend Chapter 6 of PSSP 3.1 
Announcement, SG24-5332. 
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13.8 Sample questions 

This section provides a series of questions to help aid you in preparation for 
the certification exam. The answers to these questions can be found in 
Appendix A. 

1. The log_event utility provided with the Problem Management subsystem 
writes event information: 

A. To the SDR 

B. To the AIX error log 

C. To the /var/adm/SPIogs/pman/log directory 

D. To a wraparound file using the AIX aiog command 

2. The problem management subsystem (PMAN) requires Kerberos 
principals to be listed in its access control list file in order to function. 
Which file needs to be updated for getting access to PMAN functionality? 

A. /etc/sysctl.acl 

B. /etc/syscal.cmds.acl 

C. /etc/pman.acl 

D. /etc/sysctl.pman.acl 

3. Which command would you use if you want to see a resource variable 
definition? 

A. SDRGetObjects EM_Resource_Variable 

B. lssrc -Is haem.sp3en0 -a <variable name | *> 

C. haemqvar "<variable class | *>" "-^variable name | *>" "<instance | 
*>" 

D. lsresvar -1 <resource variable name> 

4. Although the base system (bos.rte) includes minimal services for tracing, 
which of the following optional filesets you need to install if you want to 
activate the trace daemon and generate trace reports? 

A. bos.trace.sysmgt 

B. bos.rte.sysmgt 

C. bos.sysmgt.rte 

D. bos.sysmgt.trace 

5. Where is the location of all the PSSP log files? 

A. /var/adm/logs 
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B. /var/adm/SPIogs 

C. /var/ha/logs 

D. /var/SPIogs 

6. Event Management (EM) provides an application comprehensive 
monitoring of hardware and software resources in the system. Which of 
the following EM uses to gather information on system resources? 

A. Event Management Monitors 

B. System Monitors 

C. Trace Monitors 

D. Resource Monitors 

7. Which of the following is a PMAN subsystem component? 

A. sp_configd 

B. sp_configp 

C. spmand 

D. spmanrmd 

8. What is the correct order to define a condition through SP Event 
Perspectives? 

A. Decide what you want to monitor, identify the resource variable, define 
the expression, and create the condition. 

B. Decide what you want to monitor, define the expression, identify the 
resource variable, and create the condition. 

C. Define the expression, decide what you want to monitor, identify the 
resource variable, and create the condition. 

D. Define the expression, create the condition, decide what you want to 
monitor, and identify the resource variable. 


13.9 Exercises 

Here are some exercises you may wish to perform: 

1. On a test system that does not affect any users, define a condition or event 
to monitor using Event Perspectives. 

2. What does the PMAN facility provide? On a test system that does not affect 
any users, set up and test a monitor that will send a notification to the 
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console’s screen each time there is an authentication failure for remote 
execution. 
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Part 4. On-going support 
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Chapter 14. RS/6000 SP software maintenance 


This chapter discusses how to maintain backup images for the CWS and SP 
nodes as well as how to recover the images you created. In addition, we 
discuss how to apply the latest PTFs for AIX and PSSP. We provide the 
technical steps for information based on the environment we set at the 
beginning of this book. Finally, we discuss the overview of software migration 
and coexistence. 


14.1 Key concepts you should study 

This section gives you key concepts for preparing for the certification exam on 
how to maintain the software on the RS/6000 SP. You should understand: 

• How to create and manage backup images for CWS and SP nodes. 

• How to restore CWS or SP nodes and what are the necessary procedures 
after restoring. 

• How to apply the PTFs and what the required tasks are for AIX and PSSP 
on CWS and nodes. 

• What are the effects of the PTFs you applied on your SP system? 

• The concept of software migration and coexistence in supported 
environments. 

• What are the changes made between PSSP V2 and PSSP V3? 


14.2 Backup of the control workstation and SP node images 

Maintaining a good copy of backup images is as important as initial 
implementation of your SP system. Here we discuss how to maintain the 
CWS backup image and how to efficiently create SP node images with a 
scenario we set up in our environment. 

14.2.1 Backup of the control workstation 

The backup of the CWS is the same as the strategy you use for stand-alone 
RS/6000 servers because it has its own tape device to use for backup. In AIX, 
we usually back up the system with the command: mksysb -i <device_name> 

Remember that the mksysb command backs up only rootvg data. Thus, data 
other than rootvg should be backed up with the command savevgor another 
backup utility, such as sysback. 
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14.2.2 Backup of SP node images 

In scientific or parallel computing environments, we may only need one copy 
of node images across the SP complex because, in most cases, all node 
images are identical. However, in commercial or server consolidation 
environments, we usually maintain a separate copy of a node image per 
application or even per SP node. Therefore, you need to understand your 
environment and set up the SP node backup strategy. 

In general, it is recommended to keep the size of the node’s image as small 
as possible so that you can recover images quickly and manage the disk 
space needed. It is also recommended that user data should be separate 
from rootvg so that you can maintain a manageable size of node images. 
Here, the node image is the operating system image not the user data image. 
For user data, you should consider another strategy, such as ADSM, for 
backup. 

Also, remember that the node image you create is a file and is not bootable 
so that you should follow the network boot process, as discussed in Chapter 
9, “Frame and node installation” on page 267, to restore it. 

Depending upon your environment, there are many ways you can set up an 
SP node backup. Here, we introduce the way we set it up in our environment. 

14.2.3 Case scenario: How do we set up node backup? 

In our environment, we set up sp3n01 as the boot/install server. Thus, we 
created the same /spdata directory structure as CWS. Assuming that all 
nodes have different images, we needed to create individual node images. 
We NFS mounted the boot/install server node’s /spdata/sysl/install/images 
directory to all nodes and the CWS’s /spdata/sysl/install/images directory to 
the boot/install server node. We then ran mksysb -i 
/<mount_point>/bos.obj.<hostname>.image on all nodes, including the 
boot/install server node. In this way, all node images were created on each 
/spdata/sysl/install/images directory as shown in Figure 147 on page 391. 
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Figure 147. Mechanism of SP node backup in boot/install server environment 

Of course, you can write scripts to automate this process. Due to the nature 
of this book, we only introduce the mechanism for the node backup strategy. 


14.3 Restoring from mksysb image 

In the following sections, we discuss the recovery of the CWS and nodes 
when a system has crashed. 

14.3.1 Restoring the control workstation 

You may have problems when you do software maintenance. Here, we 
discuss how you can recover the CWS from a recent backup tape you 
created. Restoring the CWS is similar to recovering any RS/6000 workstation 
except that you need some post activity. 
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To restore an image of the CWS, do the following: 

1. Execute the normal procedure used to restore any RS/6000 workstation. 

2. Issue the /usr/ipp/ssp/bin/instaii_cw command. 

When a mksysb image is made from an existing CWS, there are certain 
ODM attributes that are not saved, such as node_number information. 
This script creates the proper node_number entry for the CWS in the 
ODM. It also executes some other functions as explained in 8.3.2, 
“install_cw” on page 253. 

3. Verify your CWS. 

14.3.2 Restoring the node 

The procedure that is used to restore the mksysb image to a node is similar to 
the installation process using NIM. You have to change some parameters in 
the original environment. 

The first step is to put the image that you want to restore in the 
/spdata/sysl/install/images directory. Then, you have to change the network 
environment for that node. To do this in PSSP 2.4 or earlier, you do the 
following: 

On the command line, you execute the following command: 

# spbootins -r install -i <mksysb image name> -1 <node list> 

PSSP 3.1 has some modifications to the spbootins command; you do not 
have the same flags you had in PSSP 2.4 or earlier. If you try to change the 
environment using SMIT in PSSP 3.1 with the procedure just described, you 
will get a response similar to the following: 

spbootins: 0016-601 An option was used that is no longer supported by 
this command. 

Use the "spchvgobj" command, 
spbootins: Syntax: 
spbootins [ -c selected_vg ] 

[ -r {install | customize | disk | maintenance | diag | migrate }][ -s 
yes | no ]{start_frame start_slot node_count | -1 <node_list>} 
spbootins: Syntax: 
spchvgobj < -r < volume_group > 

[ -h pv_list ] 

[ -i install_image ] 

[ -p code_version ] 

[ -v lppsource_name ] 

[ -n boot_server ] 

[ -c 1 | 2 | 3 ] 

[ -q true | false ] 

{start_frame start_slot node_count | -1 <node_list>} 
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The reason for the error is that the option -i, used to change the name of the 
image of installation, is no longer supported in PSSP 3.1. The new command 
spchvgobj should be used to change this field. This change is needed to 
support the new possibility of having multiple rootvg volume groups. 

To change the environment in PSSP 3.1, you run the following command: 

# spchvgobj -r rootvg -i <image name> -1 <node_number> 

# spbootins -r install -1 <node_number> 

As an example, to restore node 5 with an image called image.sp3n05, 

# spchvgobj -r rootvg -i bos.obj.sp3n05.image -1 5 

# spbootins -r install -1 5 

you can verify the environment with the following command: 

# splstdata -b -1 5 

Check the fields response and next_install_image. 

Now network boot the node to restore the correct image. You can do this in 
another node, different from the original, without worrying about the node 
number and its specific configuration. After the node is installed, pssp_script 
customizes it with the correct information. 


14.4 Applying latest AIX and PSSP PTFs 

This section is to be used for applying Program Temporary Fixes (PTFs) for 
AIX, PSSP, and other Licensed Program Products (LPPs) in the SP. 

14.4.1 On the control workstation 

This section briefly describes how to apply AIX and PSSP PTFs on the CWS. 

14.4.1.1 Applying AIX PTFs 

The steps for applying AIX PTFs are as follows: 

1. Create a mksysb backup image of the CWS. 

2. Check that the tape is OK by listing its contents with the command: smitty 

lsmksysb 

3. Copy the PTFs to the Ippsource directory 
/spdata/sys1/install/aix432/lppsource. 

4. Create a new .toe file by executing the commands: 

# cd /spdata/sysl/install/aix432/lppsource 
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# inutoc . 

5. Update the new PTFs to the CWS using SMIT: 

# smitty update_all 

6. Then, update the SPOT with the PTFs in the Ippsource directory using the 
command: 

# smitty nim_res_op 

with the following as input to the menu: 

Resource name: spot_aix432 

Network Install Operation to perform: update_aii 

If the status of the installation is OK, then you are done with the update of the 
AIX PTFs on the CWS. If the status of the installation is that it has failed, then 
review the output for the cause of the failure and resolve the problem. 

14.4.1.2 Applying PSSP PTFs 

The steps for applying PSSP PTFs are as follows: 

1. Create a mksysb backup image of the CWS. Always check that the tape is 
OK by listing its contents with the command: smitty ismksysb 

2. Copy the PTFs to the directory /spdata/sysl/install/pssplpp/PSSP-3.1 for 
PSSP 3.1. 

3. Create a new .toe file by issuing the following commands: 

# cd /spdata/sysl/install/pssplpp/PSSP-3.1 

# inutoc . 

4. Check the READ THIS FIRST paper that comes with any updates to the 
PSSP and the .info files for the prerequisites, corequisites, and any 
precautions that need to be taken for installing these PTFs. Check the 
filesets in the directory you copied to see that all the required filesets are 
available. 

5. Update the new PTFs to the CWS using: 

# smitty update_all 

- Note - 

In many cases, the latest PSSP PTFs include the microcode for the 
supervisor card. We strongly recommend that you check the state of the 
supervisor card after applying the PSSP PTFs. 
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14.4.2 To the node 

There are many ways you can install PTFs on the nodes. If you have a server 
consolidation environment and have different filesets installed on each node, 
it will be difficult to create one script to apply the PTFs to all the nodes at 
once. However, here we assume that we have installed the same AIX filesets 
on all the nodes. Thus, we apply the PTFs to one test node, create a script, 
and then apply the PTFs to the rest of the nodes. 

Note, that before you apply the latest PTFs to the nodes, make sure you 
apply the same level of PTFs on the CWS and boot/install server nodes. 

14.4.2.1 Applying AIX PTFs 

This method is to be used for installing the PTFs on a node by using the SMIT 
and dsh commands. 

For any of the options you choose, it is better to install the PTFs on one node 
and do the testing before applying them to all the nodes. In our scenario, we 
selected sp3noi as the test node for installing the PTFs. 

1. Log in as root and mount the Ippsource directory of the CWS in sp3noi by 
issuing the command: 

# mount sp3en0:/spdata/sysl/install/aix432/lppsource /mnt 

2. Apply the PTFs using the command: 

# smitty update_all 

INPUT device for directory / software: /mnt 

First, run this with the preview only option set to yes and check that all 
prerequisites are met. If it is OK, then go ahead and install the PTFs 
with the preview only option changed back to no. 

3. Unmount the directory you had mounted in stepl using the command: 

# umount /mnt 

4. If everything runs OK on the test node, then prepare the script from the 
/smit.script file for the rest of the nodes. As an example, you may create 
the following script: 

#/use/bin/ksh! 

# Name of the Scriptiptfinst.ksh 

# 

mount sp3en0:/spdata/sysl/install/aix432/lppsource /mnt 
/usr/lib/instl/sm_inst installp_cmd -a -d '/mnt' -f '_update_all' '-c' 

' -N' '-g' '-X' 
umount /mnt 
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5. Change the file mode to executable and owned by the root user: 

# cbmod 744 /tmp/ptfinst.ksh 

# chown root.system /tmp/ptfinst.ksh 

6. Copy to the rest of the nodes with the command: 

# hostlist | pep -w - /tmp/ptfinst.ksh /tmp 

7. Execute the script using dsh except on the test node. 

While installing the PTFs, if you get any output saying that a reboot is 
required for the PTFs to take effect, you should reboot the node. Before 
rebooting a node, if you have a switch, you may need to fence it using the 
command: 

# Efence -autojoin sp3n01 


14.4.2.2 Applying PSSP PTFs 

Applying PSSP PTFs to the nodes can be done with the same methods we 
used for applying AIX PTFs. Before applying the PTFs, make a backup image 
for the node. 

For installing PSSP PTFs, follow the same procedure except for step 1; you 
need to mount the PSSP PTFs directory instead of the Ippsource directory. 
The command is: 

# mount sp3en0:/spdata/sysl/install/pssplpp/PSSP-3.1 /mnt 

When updating the ssp.css fileset of PSSP, you must reboot the nodes for 
the Kernel extensions to take effect. 

It is recommended to make another backup image after you have applied the 
PTFs. 


14.5 Software migration and coexistence 

In earlier chapters, we discussed what is available in AIX and PSSP software 
levels. This section discusses the main changes driven by PSSP 3.1 when 
you migrate your system to PSSP 3.1 and AIX 4.3.2. 

Because migration of your CWS, your nodes, or both, is a complex task, you 
must do careful planning before you attempt to migrate. Thus, a full migration 
plan involves breaking your migration tasks down into distinct, verifiable (and 
recoverable) steps and planning of the requirements for each step. A 
well-planned migration has the added benefit of minimizing system downtime. 
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14.5.1 Migration terminology 

An AIX level is defined as <Version>.<Release>.<Modification>. A migration 
is a process of changing to a newer version or release, while an update is a 
process of changing to a new modification level. In other words, if you change 
the AIX level from 4.2 to 4.3, it is a migration, while if you change the AIX 
level from 4.3.1 to 4.3.2, it is an update. However, all PSSP level changes are 
updates. 

14.5.2 Supported migration paths 

In PSSP 3.1, the only supported paths are those shown in Table 27. If your 
current system, CWS, or any node is running at a PSSP or AIX level not listed 
in the From column of Table 27, you must update to one of the listed 
combinations before you can migrate to PSSP 3.1. Refer the manual, PSSP 
Installation and Migration Guide , GA22-7347, for detail migration procedure. 


Table 27. Supported migration paths to PSSP 3.1 


From PSSP Level 

From AIX Level 

To PSSP Level 

To AIX Level 

2.2 

4.1.5 

4.2.1 

3.1 

4.3.2 

2.3 

4.2.1 

4.3.2 

3.1 

4.3.2 

2.4 

4.2.1 

4.3.2 

3.1 

4.3.2 


You can migrate the AIX level and update the PSSP levels at the same time. 
However, we recommend to migrate the AIX level first without changing the 
PSSP level and verify system stability and functionality. Then, update the 
PSSP. 

However, even if you have found your migration path, some products or 
components of PSSP have limitations that might restrict your ability to 
migrate: 

• Switch Management 

• RS/6000 Cluster Technology 

• Performance Toolbox Parallel Extensions 

• High Availability Cluster Multi-Processing 

• IBM Virtual Shared Disk 

• IBM Recoverable Virtual Shared Disk 

• General Parallel File System 
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• Parallel Environment 

• LoadLeveler 

• Parallel Tools 

• PIOFS, CLIO/S, and NetTAPE 

• Extension node support 

For more information about these limitations, refer to the document RS/6000: 
Planning Volume 2, GA22-7281. 

14.5.3 Migration planning 

In many cases, we recommend the migration rather than a new install 
because the migration preserves all local system changes you have made, 
such as: 

• Users and groups: To preserve the settings for the users, such as 
passwords, profiles, and login shells. 

• File systems and volume groups (where names, parameters, sizes, and 
directories are kept). 

• RS/6000 SP setup (AMD, File Collections). 

• Network setup (TCP/IP, SNA). 

Before migrating, you may want to create one or more system partitions. As 
an option, you can create a production system partition with your current AIX 
and PSSP level software and a test system partition with your target level of 
AIX and PSSP 3.1 level software. 

Before you migrate any of your nodes, you must migrate your CWS and 
boot/install server node to the latest level of AIX and PSSP of any node you 
wish to serve. After these general considerations, we now give some details 
of the migration process at the CWS level and then at the node level. 

14.5.4 Overview of CWS PSSP update 

This section briefly describes what is new in PSSP 3.1 for updating the CWS. 
For further information refer, to PSSP Installation and Migration Guide , 
GA22-7347. 

We describe the main steps in the installation process but with the migration 
goal in mind. We also assume the migration of the CWS to AIX 4.3.2 has 
been done successfully. 
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1. Create the required /spdata directory, such as 
/spdata/sys1/install/aix432/lppsource and 
/spdata/sysl/install/pssplpp/PSSP-3.1. 

# mkdir -p /spdata/sysl/install/aix432/lppsource 

# mkdir -p /spdata/sysl/install/pssplpp 

2. Copy the AIX LPP images and others required for AIX LPPs from AIX 432 
media to /spdata/sys1/install/aix432/lppsource on the CWS. 

3. Verify the correct level of PAIDE (perfagent). 

The perfagent.server fileset must be installed and copied to all of the 
Ippsource directories on CWS of any SP that has one or more nodes at 
PSSP 2.4 or earlier. 

The perfagent.tools fileset is part of AIX 4.3.2. This product provides the 
capability to monitor the performance of your SP system, collects and 
displays statistical data for SP hardware and software, and simplifies 
run-time performance monitoring of a large number of nodes. This fileset 
must be installed and copied to all of the Ippsource directories on CWS of 
any SP that has one or more nodes at PSSP 3.1. 

4. Copy the PSSP images for PSSP 3.1 into the 
/spdata/sysl/install/pssplpp/PSSP-3.1 directory and rename the PSSP 
package to pssp.installp and create the .toe file. 

# bffereate -qvx -t /spdata/sysl/install/pssplpp/PSSP-3.1 -d 
/dev/rmtO all 

# cd /spdata/sysl/install/pssplpp/PSSP-3.1 

# mv ssp.usr.3.1.0.0 pssp.installp 

# inutoc . 

5. Copy an installable image (mksysb format) for the node into 
/spdata/sysl/install/images. 

6. Stop the daemons on the CWS and verify. 

Issue the issrc -a command to verify that the daemons are no longer 
running on the CWS. 

# syspar_ctrl -G -k 

# stopsrc -s sysctld 

# /etc/amd/amq (PSSP 2.2 users only) (see note) 

# stopsrc -s splogd 

# stopsrc -s hardmon 

# stopsrc -g sdr 

7. Install PSSP on the CWS. 
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The PSSP 3.1 filesets are packaged to be installed on top of previously 
supported releases. You may install all filesets available or minimum 
filesets in the PSSP 3.1 package. 

To properly set up the PSSP 3.1 on the CWS for the SDR, Hardmon, and 
other SP-related services, issue the following command: 

# install_cw 

8. Update the state of the supervisor microcode. 

Check which supervisors need to be updated by using SMIT panels or by 
issuing the spsvrmgr command: 

# spsvrmgr -G -r status all 

In case an action is required, you can update the microcode by issuing the 
command: 

# spsvrmgr -G -u <frame_number>:<slot_number> 

9. Refresh all the partition sensitive subsystem daemons. 

10. Migrate shared disks. 

If you already use Virtual Shared Disk(VSD), you have some preparation 
to do. 

14.5.5 Overview of node migration 

You cannot migrate the nodes until you have migrated the CWS and 
boot/install servers to your target AIX level (4.3.2) and PSSP 3.1. You can 
migrate the nodes to your AIX level and PSSP 3.1 in one of three ways: 

• Migration Install 

This method preserves all the file systems except /tmp as well as the root 
volume group, logical volumes, and system configuration files. This 
method requires the setup of AIX NIM on the new PSSP 3.1 CWS and 
boot/install servers. This applies only to migrations when an AIX version 
or release is changing. 

• mksysb Install 

This method erases all existence of current rootvg and installs your target 
AIX level and PSSP 3.1 using an AIX 4.3.2 mksysb image for the node. 
This installation requires the setup of AIX NIM on the new PSSP 3.1 CWS 
or boot/install servers. 

• Upgrade 

This method preserves all occurrences of the current rootvg and installs 
AIX PTF updates using the instaiip command. This method applies to 
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AIX modification level changes or when the AIX level is not changing, but 
you are updating to a new level of PSSP. 

To identify the appropriate method, you must use the information in Table 8 in 
the document PSSP Installation and Migration Guide, GA22-7347. 

Although the way to migrate a node has not changed with PSSP 3.1, we point 
out here how the PSSP 3.1 enhancements can be used when you want to 
migrate. 

1. Migration install of nodes to PSSP 3.1 

Set the bootp_response parameter to migrate for the node you migrate 
with the new PSSP 3.1 commands (spchvgobj, spbootins). 

If we migrate the nodes 5 and 6 from AIX4.2.1 and PSSP 2.4 to AIX 4.3.2 
and PSSP 3.1, we issue the following commands assuming the Ippsource 
name directory is /spdata/sys1/install/aix432/lppsource: 

# spchvgobj -r rootvg -p PSSP-3.1 -v aix432 -1 5,6 

# spbootins -r migrate -1 5,6 

The SDR is now updated and setup_server will be executed. Verify this 
with the command: splstdata -G -b -1 <node_list> 

Finally, a shutdown followed by a network boot will migrate the node. The 
AIX part will be done by NIM; whereas, the script, pssp_script, does the 
PSSP part. 

2. mksysb install of nodes 

This is the node installation that we discussed in Chapter 9, “Frame and 
node installation” on page 267. 

3. Update to a new level of PSSP and update to a new modification level of 
AIX. 

If you are on AIX 4.3.1 and PSSP 2.4 and you want to go to AIX 4.3.2 and 
PSSP 3.1, you must first update the AIX level of the node by mounting the 
aix432 Ippsource directory from the CWS on your node and running the 
instaiip command. 

Then, after you have the right AIX level installed on your node, you must 
set the bootp_response parameter to customize with the new PSSP 3.1 
commands (spchvgobj, spbootins) for the nodes 5 and 6. 

# spchvgobj -r rootvg -p PSSP-3.1 -v aix432 -1 5,6 

# spbootins -r customize -1 5,6 

Then, copy the pssp_script file from the CWS to the node: 

# pep -w <node> /spdata/sysl/install/pssp/pssp_script \ 
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/tmp/pssp_script 

After the copy is done, execute the pssp_script that updates the node’s 
PSSP to the new PSSP 3.1 level. 

14.5.6 Coexistence 

PSSP 3.1 can coexist with PSSP 2.2 and later. Coexistence is the ability to 
have multiple levels of AIX and PSSP in the same partition. 

Table 28 shows what AIX levels and PSSP levels are supported by PSSP 3.1 
in the same partition. Any combination of PSSP levels listed in this table can 
coexist in a system partition. So, you can migrate to a new level of PSSP or 
AIX one node at a time. 


Table 28. Possible AIX or PSSP combinations in a partition 


AIX Levels 

PSSP Levels 

AIX 4.1.5 or AIX 4.2.1 

PSSP 2.2 

AIX 4.2.1 or AIX 4.3.2 

PSSP 2.3 

AIX 4.2.1 or AIX 4.3.2 

PSSP 2.4 

AIX 4.3.2 

PSSP 3.1 


Some PSSP components and related LPPs still have some limitations. Also, 
many software products have PSSP and AIX dependencies. 


14.6 Related documentation 

This study guide only provides key points; so, it is recommended that you 
review the following reference books for details. 

SP Manuals 

Refer to Chapter 6 of PSSP: Installation and Migration Guide, GC23-3898, 
and PSSP Installation and Migration Guide, GA22-7347. For details on how to 
boot from the mksysb tape, read the AIX V4.3 Quick Install and Startup 
Guide, SC23-4111. 

SP Redbooks 

FIS/6000 SP Software Maintenance, SG24-5160. This redbook provides 
everything you need for software maintenance. It is strongly recommended to 
read this for real production work. For the sections on backup and PTFs, you 
may refer to Chapter 7 and Chapter 8. For the section on software migration, 
you may read Chapter 2 of PSSP 3.1 Announcement, SG24-5332. 
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14.7 Sample questions 

This section provides a series of questions to help aid you in preparation for 
the certification exam. The answers to these questions can be found in 
Appendix A. 

1. You have applied the latest PSSP fixes to the CWS. A message posted 
during fix installation states that a microcode update for high nodes is 
included in this fix. You query the status of your high node supervisor 
microcode and get the following output: 


Frame Slot Supervisor 

Media 

Installed 

Required 

State 

Versions 

Version 

Action 

1 9 Active 

u_10.3a.0612 
u_10.3a.0614 

u_10.3a.0614 

Upgrade 


u_10.3a.0615 



What command is used to update the supervisor 

microcode on the high 


nodes? 

A. spucode 

B. spsvrmgr 

C. spmicrocode 

D. sphardware 

2. You have applied the latest PSSP fixes to the CWS. What is a 
recommended task to perform? 

A. Check the state of all supervisor's microcode. 

B. Delete and re-add all system partition-sensitive daemons. 

C. Stop and restart the NTP daemon on all nodes. 

D. Remove and reacquire the administrative Kerberos ticket. 

3. Which of the following is a supported migration path? 

A. AIX 3.2.5/PSSP 2.1 ===> AIX 4.2.1/PSSP 3.1 

B. AIX 4.2.1/PSSP 2.4 ===> AIX 4.3.2/PSSP 3.1 

C. AIX 4.1.4/PSSP 2.1 ===> AIX 4.3.2/PSSP 2.2 

D. AIX 4.1,5/PSSP 2.3 ===> AIX 4.1,5/PSSP 2.4 

4. Which of the following is NOT a supported migration path? 

A. AIX 4.2.1/PSSP 2.2 ===> AIX 4.3.2/PSSP 3.1 

B. AIX 4.3.2/PSSP 2.4 ===> AIX 4.3.2/PSSP 3.1 

C. AIX 4.1,5/PSSP 2.2 ===> AIX 4.3.2/PSSP 2.2 
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D. AIX 4.1,5/PSSP 2.2 ===> AIX 4.2.1/PSSP 2.3 


5. Which of the following commands is part of the procedures to restore an 
image of the CWS? 

A. spbootins -r install -1 <mksysb image name> -1 

B. /usr/lpp/ssp/bin/install_cw 

C. mksysb -i /<mount_point>/bos.obj.<hostname>.image 

D. spbootins -r install -1 <node_number> 


14.8 Exercises 

Here are some exercises you may wish to perform: 

1. On a test system that does not affect any users, perform a control 
workstation backup, and a node backup. 

2. Apply the latest AIX and PSSP PTFs to the control workstation. 

3. Apply the latest AIX and PSSP PTFs to a node. 

4. Perform another backup for the control workstation and the node. 

5. Migrate the control workstation to AIX 4.3.2. Then update the CWS PSSP 
level to PSSP 3.1. 

6. After performing exercise 4, migrate the node to AIX 4.3.2 and PSSP 3.1. 

7. Restore the control workstation to its original state. Then restore the node 
to its original state. 
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Chapter 15. RS/6000 SP reconfiguration and update 


Most commercial environments start with a small number of nodes and 
expand their environment as time goes by or new technology becomes 
available. In Chapters 7 and 8, we discussed the key commands and files 
used for initial implementation based on our environment. In this chapter, we 
go through the procedures used to reconfigure an SP system, such as a 
adding frame, nodes, and switches, which are the most frequent activities you 
may face. Then, we describe the required activities used to replace an 
existing MCA-based uniprocessor node to PCI-based 332 MHz SMP node. 


15.1 Key concepts you should study 

This section gives you the key concepts you have to understand when you 
prepare for the certification exam on reconfiguration and migration of 
RS/6000 SP. You should understand: 

• The types of SP nodes and what the differences are among the nodes. 

• What the procedures are when you add new frames or SP nodes as well 
as the software and hardware requirements. 

• How to reconfigure the boot/install server when you set up a multi-frame 
environment. 

• How to replace existing MCA based uniprocessor nodes or SMP nodes to 
the new PCI-based 332 MHz SMP node along with its software and 
hardware requirements and procedures. 

• The technology updates on PSSP V3. 


15.2 Environment 

This section describes the environment for our RS/6000 SP system. From the 
initial RS/6000 SP system, we added a second switched frame and added 
one high node, four thin nodes, two Silver nodes, and three wide nodes as 
shown in Figure 148 on page 406. 

In the Figure 148, sp3n17 is set up as the boot/install server. The Ethernet 
adapter (enO) of sp3n17 is cabled to the same segment (subnet 3) of the enO 
of sp3n01 and CWS. The enO of the rest of nodes in frame 2 are cabled with 
the enl of sp3n17 so that they will be in the same segment (subnet 32). 

Thus, we install sp3n17, which is a boot/install server, first from CWS. Then 
we install the rest of the node from sp3n17. In the following sections, we 
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summarize the steps for adding frames, nodes, and SP switches from the 
PSSP Installation and Migration Guide, GA22-7347, even though the physical 
installation was done at the same time. 
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Figure 148. Environment after adding a second switched frame and nodes 


15.3 Adding a frame 

In our environment, we assigned sp3n17 as the boot/install server node. 
Thus, we added enO of sp3n17 on subnet 3 and enl of sp3n17 on subnet 32 
so that enl will be a gateway to reach the CWS from the nodes in Frame 2. 

With this configuration, we summarized the steps as follows: 
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- Note - 

You should obtain a valid Kerberos ticket by issuing the klist or k4init 
command from the RS/6000 SP authentication services to perform the 
following tasks. 


1. Archive the SDR on the CWS. Everytime you reconfigure your system, it is 
strongly recommended to back up the SDR with the command: 

[sp3en0:/]# SDRArchive 

SDRArchive: SDR archive file name is 

/spdata/sysl/sdr/archives/backup.98350.1559 

In case something goes wrong, you can simply restore with the command: 

SDKRestore <archive_file> 

2. Un-partition your system (optional) from the CWS. 

If your existing system has multiple partitions defined and you want to add 
a frame that has a switch, you need to bring the system down to one 
partition by using the Eunpartition command before you can add the 
additional frame. 

3. Connect the frame with RS-232 and recable the Ethernet adapters (enO), 
as described in 15.2, “Environment” on page 405, to your CWS. 

4. Configure the RS-232 control line. 

Each frame in your system requires a serial port on the CWS configured to 
accommodate the RS-232 line. Note that SP-attached servers require two 
serial lines. Define ttyi for the second Frame: 

[sp3en0:/]# mkdev -c tty -t 'tty' -s 'rs232' -p 'sal' -w 's2' 

5. Enter frame information and reinitialize the SDR. 

For SP frames, this step creates frame objects in the SDR for each frame 
in your system. At the end of this step, the SDR is reinitialized resulting in 
the creation of node objects for each node attached to your frames. 

- Note - 

You must perform this step once for SP frames and once for non-SP 
frames (SP-Attached servers). You do not need to reinitialize the SDR until 
you are entering the last set of frames (SP or non-SP). 


Specify the spframe command with -r yes to reinitialize the SDR (when 
running the command for the final series of frames), a starting frame 
number, a frame count, and the starting frame's tty port. 
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In our environment, we enter information for two frames (Frame 1 to 
Frame 2) and indicate that Frame 1 is connected to /dev/ttyo and Frame 
2 to /dev/ttyl and reinitializes the SDR: 

[sp3en0:/]# spframe -r yes 1 2 /dev/ttyO 

0513-044 The stop of the splogd Subsystem was completed successfully. 
0513-059 The splogd Subsystem has been started. Subsystem PID is 111396. 

- Note - 

If frames are not contiguously numbered, repeat this step for each series 
of contiguous frames. 


As a new feature of PSSP 3.1, SP-attached servers are supported. For 
non-SP frames, SP-attached servers also require frame objects in the SDR 
as non-SP frames, and one object is required for each S70, S70 Advanced, or 
S80 attached to your SP. 

The S70, S70 Advanced, and S80 require two tty port values to define the tty 
ports on the CWS to which the serial cables connected to the server are 
attached. The spframe tty port value defines the serial connection to the 
operator panel on the S70, S70 Advanced, and S80 hardware controls. The 
si tty port value defines the connection to the serial port on the S70, S70 
Advanced, and S80 for serial terminal (slterm) support. A switch port value is 
required for each S70, S70 Advanced, or S80 attached to your SP. 

Specify the spframe command with the -n option for each series of contiguous 
non-SP frames. Specify the -r yes option when running the command for the 
final series of frames. 

If you have 2 S70 servers (frames 3 and 4), then the first server has the 
following characteristics: 

Frame Number: 3 

tty port for operator panel connection: /dev/tty2 
tty port for serial terminal connection: /dev/tty3 
switch port number: 14 

And, the second server has the following characteristics: 

Frame Number: 4 

tty port for operator panel connection: /dev/tty4 
tty port for serial terminal connection: /dev/tty5 
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switch port number: 15 

To define these servers to PSSP and reinitialize the SDR, enter: 

# spframe -r yes -n 14 3 2 /dev/tty2 

- Note - 

The SP-Attached server in your system will be represented with the node 
number corresponding to the frame defined in this step. Continue with the 
remaining installation steps to install the SP-Attached server as an SP 
node. 


6. Verify frame information with the command: spistdata -f or spmon -d 
The output looks as follows: 

[sp3en0:/]# spistdata -f 

List Frame Database Information 

frame# tty sl_tty frame_type hardware ^protocol 


1 /dev/ttyO "" switch SP 

2 /dev/ttyl "" switch SP 

[sp3en0:/]# spmon -d 

1. Checking server process 

Process 16264 has accumulated 0 minutes and 0 seconds. 

Check ok 

2. Opening connection to server 
Connection opened 

Check ok 

3. Querying frame(s) 

2 frame(s) 

Check ok 

4. Checking frames 

This step was skipped because the -G flag was omitted. 

5. Checking nodes 

- Frame 1 - 


Frame 

Slot 

Node 

Number 

Node 

Type 

Power 

Host/Switch 
Responds 

Key 

Switch 

Env 

Fail 

Front Panel LCD/LED is 

LCD/LED Flashing 

1 

1 

high 

on 

yes 

yes 

normal 

no 

LCDs are blank 

no 

5 

5 

thin 

on 

yes 

yes 

normal 

no 

LEDs are blank 

no 

6 

6 

thin 

on 

yes 

yes 

normal 

no 

LEDs are blank 

no 

7 

7 

thin 

on 

yes 

yes 

normal 

no 

LEDs are blank 

no 

8 

8 

thin 

on 

yes 

yes 

normal 

no 

LEDs are blank 

no 

9 

9 

thin 

on 

yes 

yes 

normal 

no 

LEDs are blank 

no 

10 

10 

thin 

on 

yes 

yes 

normal 

no 

LEDs are blank 

no 

11 

11 

thin 

on 

yes 

yes 

normal 

no 

LEDs are blank 

no 

12 

12 

thin 

on 

yes 

yes 

normal 

no 

LEDs are blank 

no 

13 

13 

thin 

on 

yes 

yes 

normal 

no 

LEDs are blank 

no 

14 

14 

thin 

on 

yes 

yes 

normal 

no 

LEDs are blank 

no 

15 

15 

wide 

on 

yes 

yes 

normal 

no 

LEDs are blank 

no 






- Frame 

2- 




Frame 

Node 

Node 


Host/Switch 

Key 

Env 

Front Panel LCD/LED is 

Slot 

Number 

Type 

Power 

Responds 

Switch 

Fail 

LCD/LED 

Flashing 
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1 

17 

high 

on 

no 

notcfg 

5 

21 

thin 

on 

no 

notcfg 

6 

22 

thin 

on 

no 

notcfg 

7 

23 

thin 

on 

no 

notcfg 

8 

24 

thin 

on 

no 

notcfg 

9 

25 

thin 

on 

no 

notcfg 

10 

26 

thin 

on 

no 

notcfg 

11 

27 

wide 

on 

no 

notcfg 

13 

29 

wide 

on 

no 

notcfg 

15 

31 

wide 

on 

no 

notcfg 


normal 

no 

LCDs 

are 

blank 

no 

normal 

no 

LEDs 

are 

blank 

no 

normal 

no 

LEDs 

are 

blank 

no 

normal 

no 

LEDs 

are 

blank 

no 

normal 

no 

LEDs 

are 

blank 

no 

N/A 

no 

LCDs 

are 

blank 

no 

N/A 

no 

LCDs 

are 

blank 

no 

normal 

no 

LEDs 

are 

blank 

no 

normal 

no 

LEDs 

are 

blank 

no 

normal 

no 

LEDs 

are 

blank 

no 


Note, that SP-Attached servers will be represented as a one node frame. If an 
error occurred, the frame must be deleted using the spdeifram command prior 
to reissuing the spframe command. After updating the RS-232 connection to 
the frame, you should reissue the spframe command. 


15.4 Adding a node 

In our environment, we add one high node as 2nd boot/install server, four thin 
nodes, two Silver nodes, and three wide nodes as shown in Figure 148 on 
page 406. Assume that all nodes were installed when the frame was installed. 
Thus, the following steps are the continuation of 14.1, “Key concepts you 
should study” on page 389. After we enter all nodes information into the SDR, 
we will install sp3n17 first and then install the rest of the nodes. 

1. Gather all information that you need: 

• Hostnames for all nodes 

• IP address for all nodes 

• Default gateway information, and so on. 

2. Archive the SDR with the command: SDRArchive 

3. Update the /etc/hosts file or DNS map with new IP addresses on the CWS. 
Note, that if you do not update the /etc/hosts file now, the spethemt 
command fail. 

4. Check the status and update the state of the supervisor microcode with 
the command: spsvrmgr 

The output looks like this: 

[sp3en0:/]# spsvrmgr -G -r status all 


spsvrmgr: Frame Slot 

Supervisor 

State 

Media 

Versions 

Installed 

Version 

Required 

Action 

1 0 

Active 

u_10.lc.0709 
u 10.1c.070c 

u 10.1c.070c 

None 

1 

Active 

u_10.3a.0614 
u_10.3a.0615 

u_10.3a.0615 

None 

17 

Active 

u_80.19.060b 

u_80.19.060b 

None 
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0 

Active 

u_10.3c.0709 
u 10.3c.070c 

u 10.3c.070c 

None 

1 

Active 

u_10.3a.0614 
u_10.3a.0615 

u_10.3a.0615 

None 

9 

Active 

u_10.3e.0704 
u_10.3e.0706 

u_10.3e.0706 

None 

10 

Active 

u_10.3e.0704 
u_10.3e.0706 

u_10.3e.0706 

None 

17 

Active 

u 80.19.060b 

u 80.19.060b 

None 


In our environment, there is no Required Action needed to be taken. 
However, if you need to update the microcode of the frame supervisor of 
frame 2, enter: 

# spsvrmgr -G -u 2:0 

5. Enter the required enO adapters Information with the command: spethemt 

[sp3en0:/etc]# spethernt -s no -1 17 192.168.3.117 255.255.255.0 192.168.3.130 
[sp3en0:/etc]# spethernt -s no -1 21 192.168.32.121 255.255.255.0 192.168.32.117 

[sp3en0:/etc]# spethernt -s no -1 22 192.168.32.122 255.255.255.0 192.168.32.117 

[sp3en0:/etc]# spethernt -s no -1 23 192.168.32.123 255.255.255.0 192.168.32.117 

[sp3en0:/etc]# spethernt -s no -1 24 192.168.32.124 255.255.255.0 192.168.32.117 

[sp3en0:/etc]# spethernt -s no -1 25 192.168.32.125 255.255.255.0 192.168.32.117 

[sp3en0:/etc]# spethernt -s no -1 26 192.168.32.126 255.255.255.0 192.168.32.117 

[sp3en0:/etc]# spethernt -s no -1 27 192.168.32.127 255.255.255.0 192.168.32.117 

[sp3en0:/etc]# spethernt -s no -1 29 192.168.32.129 255.255.255.0 192.168.32.117 

[sp3en0:/etc]# spethernt -s no -1 31 192.168.32.131 255.255.255.0 192.168.32.117 

If you are adding an extension node to your system, you may want to enter 
the required node information now. For more information, refer to Chapter 
9 of PSSP Installation and Migration Guide, GA22-7347. 

6. Acquire the hardware Ethernet addresses with the command: sphrdward 

This step gets hardware Ethernet addresses for the enO adapters for your 
nodes from the nodes themselves and puts them into the Node Objects in 
the SDR. This information is used to set up the /etc/bootptab files for your 
boot/install servers. 

To get all hardware Ethernet addresses for the nodes specified in the 
node list (the -1 flag), enter: 

[sp3en0:/]# sphrdwrad -1 17,21,22,23,24,25,26,27,29,31 

A sample output looks like: 

Acquiring hardware Ethernet address for node 17 
Acquiring hardware Ethernet address for node 21 
Acquiring hardware Ethernet address for node 22 
Acquiring hardware Ethernet address for node 23 
Acquiring hardware Ethernet address for node 24 
Acquiring hardware Ethernet address for node 25 
Acquiring hardware Ethernet address for node 26 
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Acquiring hardware Ethernet address for node 27 
Acquiring hardware Ethernet address for node 29 
Acquiring hardware Ethernet address for node 31 


Hardware 

ethemet 

address 

for 

node 

17 

is 

02608C2E86CA 

Hardware 

ethemet 

address 

for 

node 

21 

is 

10005AFA0518 

Hardware 

ethemet 

address 

for 

node 

22 

is 

10005AFA17E3 

Hardware 

ethemet 

address 

for 

node 

23 

is 

10005AFA1721 

Hardware 

ethemet 

address 

for 

node 

24 

is 

1000 5AFA0 7DF 

Hardware 

ethemet 

address 

for 

node 

25 

is 

0004AC4947E9 

Hardware 

ethemet 

address 

for 

node 

26 

is 

0 0 04AC4 94B4 0 

Hardware 

ethemet 

address 

for 

node 

27 

is 

02608C2E7643 

Hardware 

ethemet 

address 

for 

node 

29 

is 

02608C2E7C1E 

Hardware 

ethemet 

address 

for 

node 

31 

is 

02608C2E78C9 


- Note - 

• Do not do this step on a production running system because it shuts 
down the nodes. 

• Select only the new nodes you are adding. All the nodes you select are 
powered off and back on. 

• The nodes for which you are obtaining Ethernet addresses must be 
physically powered on when you perform this step. No ttys can be 
opened in write mode. 

7. Verify the Ethernet addresses with the command: spistdata -b 
[sp3en0:/]# spistdata -b 

A sample output looks like: 

List Node Boot/Install Information 

node# hostname hdw_enet_addr srvr response 

install_disk 

last_install_image last_install_time next_install_image 
lppsource_name 

pssp_ver selected_vg 


1 sp3n01.msc.itso. 02608CF534CC 0 

hdiskO 

bos.obj.ssp.432 Thu_Dec_3_11:18:20 

aix432 

PSSP-3.1 rootvg 

5 sp3n05.msc.itso. 10005AFA13AF 1 

hdiskO 

bos.obj.ssp.432 Thu_Dec_3_15:59:40 

aix432 

PSSP-3.1 rootvg 

6 sp3n06.msc.itso. 10005AFA1B12 1 

hdiskO 


disk 

bos.obj.ssp.432 

disk 

bos.obj.ssp.432 

disk 
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aix432 


bos.obj.ssp.432 Thu_Dec_3_15:59:56 

PSSP-3.1 rootvg 

7 sp3n07.msc.itso. 10005AFA13D1 1 

hdiskO 

bos.obj.ssp.432 Thu_Dec_3_16:05:20 

aix432 

PSSP-3.1 rootvg 

8 sp3n08.msc.itso. 10005AFA0447 1 

hdiskO 

bos.obj.ssp.432 Thu_Dec_3_15:53:33 

aix432 

PSSP-3.1 rootvg 

9 sp3n09.msc.itso. 10005AFA158A 1 

hdiskO 

bos.obj.ssp.432 Thu_Dec_3_15:56:28 

aix432 

PSSP-3.1 rootvg 

10 sp3nl0.msc.itso. 10005AFA159D 1 

hdiskO 

bos.obj.ssp.432 Fri_Dec_4_10:25:44 

aix432 

PSSP-3.1 rootvg 

11 sp3nll.msc.itso. 10005AFA147C 1 

hdiskO 

bos.obj.ssp.432 Thu_Dec_3_15:59:57 

aix432 

PSSP-3.1 rootvg 

12 sp3nl2.msc.itso. 10005AFA0AB5 1 

hdiskO 

bos.obj.ssp.432 Thu_Dec_3_15:55:29 

aix432 

PSSP-3.1 rootvg 

13 sp3nl3.msc.itso. 10005AFA1A92 1 

hdiskO 

bos.obj.ssp.432 Thu_Dec_3_16:07:48 

aix432 

PSSP-3.1 rootvg 

14 sp3nl4.msc.itso. 10005AFA0333 1 

hdiskO 

bos.obj.ssp.432 Thu_Dec_3_16:08:31 

aix432 

PSSP-3.1 rootvg 

15 sp3nl5.msc.itso. 02608C2E7785 1 

hdiskO 

bos.obj.ssp.432 Thu_Dec_3_16:05:03 

aix432 

PSSP-3.1 rootvg 

17 sp3nl7.msc.itso. 02608C2E86CA 0 

hdiskO 

initial initial 

default 

PSSP-3.1 rootvg 


bos.obj.ssp.432 

disk 

bos.obj.ssp.432 

disk 

bos.obj.ssp.432 

disk 

bos.obj.ssp.432 

disk 

bos.obj.ssp.432 

disk 

bos.obj.ssp.432 

disk 

bos.obj.ssp.432 

disk 

bos.obj.ssp.432 

disk 

bos.obj.ssp.432 

install 

bos.obj.ssp.432 

install 

default 
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21 sp3n21.msc.itso. 

10005AFA0518 17 

install 


hdiskO 




initial 

initial 


default 

default 




PSSP-3.1 

rootvg 



22 sp3n22.msc.itso. 

10005AFA17E3 17 

install 


hdiskO 




initial 

initial 


default 

default 




PSSP-3.1 

rootvg 



23 sp3n23.msc.itso. 

10005AFA1721 17 

install 


hdiskO 




initial 

initial 


default 

default 




PSSP-3.1 

rootvg 



24 sp3n24.msc.itso. 

1000 5AFA0 7DF 17 

install 


hdiskO 




initial 

initial 


default 

default 




PSSP-3.1 

rootvg 



25 sp3n25.msc.itso. 

0004AC4947E9 17 

install 


hdiskO 




initial 

initial 


default 

default 




PSSP-3.1 

rootvg 



26 sp3n26.msc.itso. 

0004AC494B40 17 

install 


hdiskO 




initial 

initial 


default 

default 




PSSP-3.1 

rootvg 



27 sp3n27.msc.itso. 

02608C2E7643 17 

install 


hdiskO 




initial 

initial 


default 

default 




PSSP-3.1 

rootvg 



29 sp3n29.msc.itso. 

02608C2E7C1E 17 

install 


hdiskO 




initial 

initial 


default 

default 




PSSP-3.1 

rootvg 



31 sp3n31.msc.itso. 

02608C2E78C9 17 

install 


hdiskO 




initial 

initial 


default 

default 




PSSP-3.1 

rootvg 




8. Configure additional adapters for nodes to create adapter objects in the 
SDR with the command spadaptrs. You can only configure Ethernet (en), 
FDDI (fi), Token Ring (tr), and cssO (applies to the SP Switch) with this 
command. To configure adapters, such as ESCON and PCA, you must 
configure the adapter manually on each node using dshor modify the 
firstboot.cust file. 
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For enladapter, enter: 

[sp3en0:/]# spadaptrs -s no -t bnc -1 17 enl 192.168.32.117 
255.255.255.0 

For the cssO (SP Switch) adapter, the output looks as such: 

[sp3en0:/]# spadaptrs -s no -n no -a yes -1 17 cssO 192.168.13.17 255.255.255.0 

[sp3en0:/]# spadaptrs -s no -n no -a yes -1 21 cssO 192.168.13.21 255.255.255.0 

[sp3en0:/]# spadaptrs -s no -n no -a yes -1 22 cssO 192.168.13.22 255.255.255.0 

[sp3en0:/]# spadaptrs -s no -n no -a yes -1 23 cssO 192.168.13.23 255.255.255.0 

[sp3en0:/]# spadaptrs -s no -n no -a yes -1 24 cssO 192.168.13.24 255.255.255.0 

[sp3en0:/]# spadaptrs -s no -n no -a yes -1 25 cssO 192.168.13.25 255.255.255.0 

[sp3en0:/]# spadaptrs -s no -n no -a yes -1 26 cssO 192.168.13.26 255.255.255.0 

[sp3en0:/]# spadaptrs -s no -n no -a yes -1 27 cssO 192.168.13.27 255.255.255.0 

[sp3en0:/]# spadaptrs -s no -n no -a yes -1 29 cssO 192.168.13.29 255.255.255.0 

[sp3en0:/]# spadaptrs -s no -n no -a yes -1 31 cssO 192.168.13.31 255.255.255.0 

If you specify the -s flag to skip IP addresses when you are setting the 
cssO switch addresses, you must also specify -n no to not use switch 
numbers for IP address assignment and -a yes to use ARP. 

The output looks as such: 


- Note - 

The command spadaptrs is supported by only two adapters for the Ethernet 
(en), FDDI (fi), and Token Ring (tr) in PSSP V2.4 or earlier. However, with 
PTFs (ssp.basic.2.4.0.4) on PSSP 2.4 or PSSP3.1, it is changed to support 
as many adapters as you can have in the system. 


9. Configure initial host names for nodes to change the default host name 
information in the SDR node objects with the command sphostnam. The 
default is the long form of the enO host name, which is how the spethemt 
command processes defaulted host names. However, we set the 
hostname as short name: 

[sp3en0:/]# sphostnam -a enO -f short -1 17,21,22,23,24,25,26,27,29,31 

10.Set up nodes to be installed. 


- Note - 

You cannot export /usr or any directories below /usr because an NFS 
export problem will occur. If you have exported the 
/spdata/sysl/install/image directory or any parent directory, you must 
unexport it using the exportfs -u command before running setup_server. 


From the output of step 7, we need to change the image name and AIX 
version. In addition, we have checked sp3n17 node points to the CWS as 
boot/install server, and all the rest of nodes point to sp3n17 as boot/install 
server, which is the default in a multi-frame environment. However, if you 
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need to select the different node to be boot/install server, you can use -n 
option of the spchvgobj command. 

To change this information in SDR, enter: 

[sp3en0:/]# spchvgobj -r rootvg -i bos.obj.ssp.432 -1 
17,21,22,23,24,25,26,27,29,31 

A sample output looks like: 

spchvgobj: Successfully changed the Node and Volume_Group objects for 
node number 17, volume group rootvg. 

spchvgobj: Successfully changed the Node and Volume_Group objects for 
node number 21, volume group rootvg. 

spchvgobj: Successfully changed the Node and Volume_Group objects for 
node number 22, volume group rootvg. 

spchvgobj: Successfully changed the Node and Volume_Group objects for 
node number 23, volume group rootvg. 

spchvgobj: Successfully changed the Node and Volume_Group objects for 
node number 24, volume group rootvg. 

spchvgobj: Successfully changed the Node and Volume_Group objects for 
node number 25, volume group rootvg. 

spchvgobj: Successfully changed the Node and Volume_Group objects for 
node number 26, volume group rootvg. 

spchvgobj: Successfully changed the Node and Volume_Group objects for 
node number 27, volume group rootvg. 

spchvgobj: Successfully changed the Node and Volume_Group objects for 
node number 29, volume group rootvg. 

spchvgobj: Successfully changed the Node and Volume_Group objects for 
node number 31, volume group rootvg. 

spchvgobj: The total number of changes successfully completed is 10. 
spchvgobj: The total number of changes which were not successfully 
completed is 0. 

Now run the command spbootins to run setup_server to configure the 
boot/install server. We first installed sp3n17 then the rest of the nodes 
later: 

[sp3en0:/]# spbootins -r install -1 17 

11. Refresh the system partition-sensitive subsystems on both the CWS and 
the nodes: 

[sp3en0:/]# syspar_ctrl -r -G 

12. Verify all node information with the spistdata command with the options 

-f, -n, -a, or -b. 

13. Change the default network tunable values (optional). 

If you set up the boot/install server, and it is acting as a gateway to the 
CWS, the ipforwarding must be enabled. To turn it on, issue: 

# /usr/sbin/no -o ipforwarding=l 
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When a node is installed, migrated, or customized (set to customize and 
rebooted), and that node's boot/install server does not have a 
l/tftpboot/tuning.cust file, a default file of system performance tuning 
variable settings in /usr/lpp/ssp/install/config/tuning.default is copied to 
/tftpboot/tuning.cust on that node. You can override these values by 
following one of the methods described in the following list: 

IBM supplies three alternate tuning files that contain initial performance 
tuning parameters for three different SP environments: 
/usr/lpp/ssp/install/config/tuning.commercial, tuning.development, and 
tuning.scientific. 

- Note - 

The S70, S70 Advanced, and S80 should not use the tuning.scientific file 
because of the large number of processors and the amount of traffic that 
they can generate. 


To select the sample tuning file, issue the cptuning command to copy to 
/tftpboot/tuning.cust on the CWS and propagate from there to each node 
in the system when it is installed, migrated, or customized. 

Note that each node inherits its tuning file from its boot/install server. 
Nodes that have as their boot/install server another node (other than the 
CWS) obtain their tuning.cust file from that server node; so, it is necessary 
to propagate the file to the server node before attempting to propagate it 
to the client node. 

14. Perform additional node customization, such as adding installp images, 
configuring host names, setting up NFS, AFS, or NIS, and configuring 
adapters that are not configured automatically (optional). 

The script.cust script is run from the PSSP NIM customization script 
(pssp_script) after the node's AIX and PSSP software have been installed 
but before the node has been rebooted. This script is run in a limited 
environment where not all services are fully configured. Because of this 
limited environment, you should restrict the use of the script.cust function. 
The function must be performed prior to the post-installation reboot of the 
node. 

The firstboot.cust script is run during the first boot of the node 
immediately after it has been installed. This script runs better in an 
environment where most of the services have been fully configured. 

15. Additional switch configuration (optional) 

If you have added a frame with a switch, perform the following steps: 
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1. Select a topology file from the /etc/SP directory on the CWS. 


- Note - 

SP-Attached servers never contain a node switch board, therefore, never 
include non-SP frames when determining your topology files. 


2. Manage the switch topology files 

The switch topology file must be stored in the SDR. The switch 
initialization code uses the topology file stored in the SDR when 
starting the switch (Estart). When the switch topology file is selected 
for your system's switch configuration, it must be annotated with 
Eannotator then stored in the SDR with Etopoiogy. The switch topology 
file stored in the SDR can be overwritten by having an expected.top file 
in /etc/SP on the primary node. Estart always checks for an 
expected.top file in /etc/SP before using the one stored in the SDR. 
The expected.top file is used when debugging or servicing the switch. 

3. Annotate a switch topology file with the command: Eannotator 

Annotate a switch topology file before storing it in the SDR. Use the 
Eannotator command to update the switch topology file's connection 
labels with their correct physical locations. Use the -o yes flag to store 
the switch topology file in the SDR: 

[sp3en0:/etc]# Eannotator -F /etc/SP/expected.top.2nsb.Oisb.0 -f 
/etc/SP/expected.top.annotated -O yes 

4. Set the switch clock source for all switches with the command: Eciock 

For our environment, select /etc/SP/Eclock.top.2nsb.0isb.0 as an 
topology file and enter: 

[sp3en0:/]# Eciock -f /etc/SP/Eclock.top.2nsb.Oisb.0 

To verify the switch configuration information, enter: 

[sp3en0:/]# splstdata -s 

List Node Switch Information 

switch switch switch switch switch 
node# initial_hostname node# protocol number chip chip_port 


1 sp3n01.msc.itso. 

0 

IP 

1 

5 

3 

5 sp3n05.msc.itso. 

4 

IP 

1 

5 

1 

6 sp3n06.msc.itso. 

5 

IP 

1 

5 

0 

7 sp3n07.msc.itso. 

6 

IP 

1 

6 

2 

8 sp3n08.msc.itso. 

7 

IP 

1 

6 

3 

9 sp3n09.msc.itso. 

8 

IP 

1 

4 

3 

10 sp3nl0.msc.itso. 

9 

IP 

1 

4 

2 

11 sp3nll.msc.itso. 

10 

IP 

1 

7 

0 

12 sp3nl2.msc.itso. 

11 

IP 

1 

7 

1 

13 sp3nl3.msc.itso. 

12 

IP 

1 

4 

1 
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14 

sp3nl4.msc.itso. 

13 

IP 

1 

4 

0 

15 

sp3nl5.msc.itso. 

14 

IP 

1 

7 

2 

17 

sp3nl7.msc.itso. 

16 

IP 

2 

5 

3 

21 

sp3n21.msc.itso. 

20 

IP 

2 

5 

1 

22 

sp3n22.msc.itso. 

21 

IP 

2 

5 

0 

23 

sp3n23.msc.itso. 

22 

IP 

2 

6 

2 

24 

sp3n24.msc.itso. 

23 

IP 

2 

6 

3 

25 

sp3n25.msc.itso. 

24 

IP 

2 

4 

3 

26 

sp3n26.msc.itso. 

25 

IP 

2 

4 

2 

27 

sp3n27.msc.itso. 

26 

IP 

2 

7 

0 

29 

sp3n29.msc.itso. 

28 

IP 

2 

4 

1 

31 

sp3n31.msc.itso. 

30 

IP 

2 

7 

2 

switch frame slot 

sw i t ch_j?ar t i t i on 

switch 

clock 

switch 

number 

number number 


number 

type 

input 

level 

1 

1 17 


1 

129 

0 


2 

2 17 


1 

129 

3 



switch_j?art 

number 


topology 

filename 


primary 

name 


arp switch_node 
enabled nos. used 


1 expected.top.an sp3n05.msc.itso. yes 


no 


16. Network boot the boot/install server node sp3n17. 

1. To monitor installation progress by opening the node's read-only 
console, issue: 

[sp3en0 : /] # slterm 2 1 

2. To network boot sp3n17, issue: 

[sp3en0:/]# nodecond 2 l& 

Monitor /var/adm/SPIogs/spmon/nc/nc.<frame_number>.<node_nunnber> 
and check the /var/adm/SPIogs/sysman/<node>.console.log file on the 
boot/install node to see if setup_server has completed. 

17. Verify that system management tools were correctly installed on the 
boot/install servers. Now that the boot/install servers are powered up, run 
the verification test from the CWS to check for correct installation of the 
system management tools on these nodes. 

To do this, enter: 

[sp3en0:/]# SYSM£JM_test 

After the tests are run, the system creates a log in /var/adm/SPIogs called 
SYSMAN_test.log. 

18. After you install the boot/install server, run the command spbootins to run 
setup_server for the rest of the nodes. 

[sp3en0:/etc/]# spbootins -r install -1 21,22,23,24,25,26,27,29,31 

The sample output shows as follows: 

setup_server command results from sp3en0 
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setup_server: Running services_config script to configure SSP services.This may tak 
e a few minutes... 

rc.ntp: NTP already running - not starting ntp 
0513-029 The supfilesrv Subsystem is already active. 

Multiple instances are not supported. 

/etc/auto/startauto: The automount daemon is already running on this system. 

setup_CWS: Control Workstation setup complete. 

mknimmast: Node 0 (sp3en0) already configured as a NIM master. 

create_krb_files: tftpaccess.ctl file and client srvtab files created/updated 

on server node 0. 

mknimres: Copying /usr/lpp/ssp/install/bin/pssp_script to /spdata/sysl/install/pssp 
/pssp_script. 

mknimres: Copying /usr/lpp/ssp/install/config/bosinst_data.template to /spdata/sysl 
/install/pssp/bosinst_data. 

mknimres: Copying /usr/lpp/ssp/install/config/bosinst_data_prompt.template to /spda 
ta/sysl/install/pssp/bosinst_data_jprompt. 

mknimres: Copying /usr/lpp/ssp/install/config/bosinst_data_migrate.template to /spd 
ata/sysl/install/pssp/bosinst_data_migrate. 

mknimclient: 0016-242: Client node 1 (sp3n01.msc.itso.ibm.com) already defined on s 
erver node 0 (sp3en0). 

mknimclient: 0016-242: Client node 17 (sp3nl7.msc.itso.ibm.com) already defined on 
server node 0 (sp3en0). 

export_clients: File systems exported to clients from server node 0. 
allnimres: Node 1 (sp3n01.msc.itso.ibm.com) prepared for operation: disk, 
allnimres: Node 17 (sp3nl7.msc.itso.ibm.com) prepared for operation: disk, 
setup_server: Processing complete (rc= 0). 
setup_server command results from sp3n01.msc.itso.ibm.com 


setup_server: Running services_config script to configure SSP services.This may tak 
e a few minutes... 

rc.ntp: NTP already running - not starting ntp 
supper: Active volume group rootvg. 

Updating collection sup.admin from server sp3en0.msc.itso.ibm.com. 

File Changes: 0 updated, 0 removed, 0 errors. 

Updating collection user.admin from server sp3en0.msc.itso.ibm.com. 

File Changes: 6 updated, 0 removed, 0 errors. 

Updating collection power_system from server sp3en0.msc.itso.ibm.com. 

File Changes: 0 updated, 0 removed, 0 errors. 

Updating collection node.root from server sp3en0.msc.itso.ibm.com. 

File Changes: 0 updated, 0 removed, 0 errors. 

0513-029 The supfilesrv Subsystem is already active. 

Multiple instances are not supported. 

/etc/auto/startauto: The automount daemon is already running on this system, 
mknimmast: Node 1 (sp3n01.msc.itso.ibm.com) already configured as a NIM master. 
create_krb_files: tftpaccess.ctl file and client srvtab files created/updated 
on server node 1. 

mknimres: Copying /usr/lpp/ssp/install/bin/pssp_script to /spdata/sysl/install/pssp 
/pssp_script. 

mknimres: Copying /usr/lpp/ssp/install/config/bosinst_data.template to /spdata/sysl 
/install/pssp/bosinst_data. 

mknimres: Copying /usr/lpp/ssp/install/config/bosinst_data_prompt.template to /spda 
ta/sysl/install/pssp/bosinst_data_jprompt. 

mknimres: Copying /usr/lpp/ssp/install/config/bosinst_data_migrate.template to /spd 
ata/sysl/install/pssp/bosinst_data_migrate. 

mknimclient: 0016-242: Client node 5 (sp3n05.msc.itso.ibm.com) already defined on s 
erver node 1 (sp3n01.msc.itso.ibm.com). 

mknimclient: 0016-242: Client node 6 (sp3n06.msc.itso.ibm.com) already defined on s 
erver node 1 (sp3n01.msc.itso.ibm.com). 

mknimclient: 0016-242: Client node 7 (sp3n07.msc.itso.ibm.com) already defined on s 
erver node 1 (sp3n01.msc.itso.ibm.com). 

mknimclient: 0016-242: Client node 8 (sp3n08.msc.itso.ibm.com) already defined on s 
erver node 1 (sp3n01.msc.itso.ibm.com). 
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mknimclient: 0016-242: Client node 9 (sp3n09.msc.itso.ibm.com) already defined on s 
erver node 1 (sp3n01.msc.itso.ibm.com). 

mknimclient: 0016-242: Client node 10 (sp3nl0.msc.itso.ibm.com) already defined on 
server node 1 (sp3n01.msc.itso.ibm.com). 

mknimclient: 0016-242: Client node 11 (sp3nll.msc.itso.ibm.com) already defined on 
server node 1 (sp3n01.msc.itso.ibm.com). 

mknimclient: 0016-242: Client node 12 (sp3nl2.msc.itso.ibm.com) already defined on 
server node 1 (sp3n01.msc.itso.ibm.com). 

mknimclient: 0016-242: Client node 13 (sp3nl3.msc.itso.ibm.com) already defined on 
server node 1 (sp3n01.msc.itso.ibm.com). 

mknimclient: 0016-242: Client node 14 (sp3nl4.msc.itso.ibm.com) already defined on 
server node 1 (sp3n01.msc.itso.ibm.com). 

mknimclient: 0016-242: Client node 15 (sp3nl5.msc.itso.ibm.com) already defined on 
server node 1 (sp3n01.msc.itso.ibm.com). 

export_clients: File systems exported to clients from server node 1. 
allnimres: Node 5 (sp3n05.msc.itso.ibm.com) prepared for operation: disk, 

allnimres: Node 6 (sp3n06.msc.itso.ibm.com) prepared for operation: disk, 

allnimres: Node 7 (sp3n07.msc.itso.ibm.com) prepared for operation: disk, 

allnimres: Node 8 (sp3n08.msc.itso.ibm.com) prepared for operation: disk, 

allnimres: Node 9 (sp3n09.msc.itso.ibm.com) prepared for operation: disk, 

allnimres: Node 10 (sp3nl0.msc.itso.ibm.com) prepared for operation: disk, 

allnimres: Node 11 (sp3nll.msc.itso.ibm.com) prepared for operation: disk, 

allnimres: Node 12 (sp3nl2.msc.itso.ibm.com) prepared for operation: disk, 

allnimres: Node 13 (sp3nl3.msc.itso.ibm.com) prepared for operation: disk, 

allnimres: Node 14 (sp3nl4.msc.itso.ibm.com) prepared for operation: disk, 

allnimres: Node 15 (sp3nl5.msc.itso.ibm.com) prepared for operation: disk, 

setup_server: Processing complete (rc= 0). 

setup_server command results from sp3nl7.msc.itso.ibm.com 


setup_server: Running services_config script to configure SSP services.This may tak 
e a few minutes... 

rc.ntp: NTP already running - not starting ntp 
supper: Active volume group rootvg. 

Updating collection sup.admin from server sp3en0.msc.itso.ibm.com. 

File Changes: 0 updated, 0 removed, 0 errors. 

Updating collection user.admin from server sp3en0.msc.itso.ibm.com. 

File Changes: 6 updated, 0 removed, 0 errors. 

Updating collection power_system from server sp3en0.msc.itso.ibm.com. 

File Changes: 0 updated, 0 removed, 0 errors. 

Updating collection node.root from server sp3en0.msc.itso.ibm.com. 

File Changes: 0 updated, 0 removed, 0 errors. 

0513-029 The supfilesrv Subsystem is already active. 

Multiple instances are not supported. 

/etc/auto/startauto: The automount daemon is already running on this system, 
mknimmast: Node 17 (sp3nl7.msc.itso.ibm.com) already configured as a NIM master. 
create_krb_files: tftpaccess.ctl file and client srvtab files created/updated 
on server node 17. 

mknimres: Copying /usr/lpp/ssp/install/bin/pssp_script to /spdata/sysl/install/pssp 
/pssp_script. 

mknimres: Copying /usr/lpp/ssp/install/config/bosinst_data.template to /spdata/sysl 
/install/pssp/bosinst_data. 

mknimres: Copying /usr/lpp/ssp/install/config/bosinst_data_prompt.template to /spda 
ta/sysl/install/pssp/bosinst_data_jprompt. 

mknimres: Copying /usr/lpp/ssp/install/config/bosinst_data_migrate.template to /spd 
ata/sysl/install/pssp/bosinst_data_migrate. 

mknimclient: 0016-242: Client node 21 (sp3n21.msc.itso.ibm.com) already defined on 
server node 17 (sp3nl7.msc.itso.ibm.com). 

mknimclient: 0016-242: Client node 22 (sp3n22.msc.itso.ibm.com) already defined on 
server node 17 (sp3nl7.msc.itso.ibm.com). 

mknimclient: 0016-242: Client node 23 (sp3n23.msc.itso.ibm.com) already defined on 
server node 17 (sp3nl7.msc.itso.ibm.com). 

mknimclient: 0016-242: Client node 24 (sp3n24.msc.itso.ibm.com) already defined on 
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server node 17 (sp3nl7.msc.itso.ibm.com). 

mknimclient: 0016-242: Client node 25 (sp3n25.msc.itso.ibm.com) already defined on 
server node 17 (sp3nl7.msc.itso.ibm.com). 

mknimclient: 0016-242: Client node 26 (sp3n26.msc.itso.ibm.com) already defined on 
server node 17 (sp3nl7.msc.itso.ibm.com). 

mknimclient: 0016-242: Client node 27 (sp3n27.msc.itso.ibm.com) already defined on 
server node 17 (sp3nl7.msc.itso.ibm.com). 

mknimclient: 0016-242: Client node 29 (sp3n29.msc.itso.ibm.com) already defined on 
server node 17 (sp3nl7.msc.itso.ibm.com). 

mknimclient: 0016-242: Client node 31 (sp3n31.msc.itso.ibm.com) already defined on 
server node 17 (sp3nl7.msc.itso.ibm.com). 

export_clients: File systems exported to clients from server node 17. 
allnimres: Node 21 (sp3n21.msc.itso.ibm.com) prepared for operation: install, 

allnimres: Node 22 (sp3n22.msc.itso.ibm.com) prepared for operation: install, 

allnimres: Node 23 (sp3n23.msc.itso.ibm.com) prepared for operation: install, 

allnimres: Node 24 (sp3n24.msc.itso.ibm.com) prepared for operation: install, 

allnimres: Node 25 (sp3n25.msc.itso.ibm.com) prepared for operation: install, 

allnimres: Node 26 (sp3n26.msc.itso.ibm.com) prepared for operation: install, 

allnimres: Node 27 (sp3n27.msc.itso.ibm.com) prepared for operation: install, 

allnimres: Node 29 (sp3n29.msc.itso.ibm.com) prepared for operation: install, 

allnimres: Node 31 (sp3n31.msc.itso.ibm.com) prepared for operation: install. 

setup_server: Processing complete (rc= 0). 

19. Network boot the rest of the nodes: 

[sp3en0:/]# nodecond 2 5& 

Then, finished the rest of nodes. Monitor 

/var/adm/SPIogs/spmon/nc/nc.<franne_number>.<node_nunnber> and 
check the /var/adm/SPIogs/sysman/<node>.console.log file on the 
boot/install node to see if setup_server has completed. 

20. Verify node installation. 

To check the hostResponds and powerLED indicators for each node, 
enter: 

[sp3en0:/]# spmon -d -G 

21 .Start the switch with the following command after all nodes are installed: 

[sp3en0:/]# Estart 

Estart: Oncoming primary != primary, Estart directed to oncoming primary 

Estart: 0028-061 Estart is being issued to the primary node: sp3n05.msc.itso.ibm.com. 

Switch initialization started on sp3n05.msc.itso.ibm.com. 

Initialized 14 node(s). 

Switch initialization completed. 

If you have set up system partitions, do this step in each partition. 

22.Verify that the switch was installed correctly by running a verification test 
to ensure that the switch is installed completely. To do this, enter: 

[sp3en0:/]# CSS_test 

After the tests are run, the system creates a log in /var/adm/SPIogs called 
CSS_test.log.To check the switchResponds and powerLED indicators for 
each node, enter: 

[sp3en0:/]# spmon -d -G 
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23.Customize the node just installed: 

• Update .profile with proper PSSP command paths. 

• Get the Kerberos ticket with the command: k4init root.admin, and so 
on. 


15.5 Adding existing S70 to an SP system 

If you want to preserve the environment of your existing S70, S7A, or S80 

server, perform the following steps to add an SP-Attached server and 

preserve your existing software environment: 

1. Upgrade AIX: If your SP-Attached server is not at AIX 4.3.2, you must first 
upgrade to that level of AIX before proceeding. 

2. Set up name resolution of the SP-Attached server: In order to do PSSP 
customization, the following must be resolvable on the SP-Attached 
server: 

• The control workstation host name. 

• The name of the boot/install server's interface that is attached to the 
SP-Attached server's eno interface. 

3. Set up routing to the CWS host name: If you have a default route set up on 
the SP-Attached server, you will have to delete it. If you do not remove the 
route, customization will fail when it tries to set up the default route defined 
in the SDR. In order for customization to occur, you must define a static 
route to the control workstation's host name. For example, the control 
workstation's host name is its token ring address, such as 9.114.73.76, 
and your gateway is 9.114.73.256: 

# route add -host 9.114.73.76 9.114.73.256 

4. FTP the SDR_dest_info file: During customization, certain information will 
be read from the SDR. In order to get to the SDR, you must FTP the 
/etc/SDR_dest_info file from the control workstation to the 
/etc/SDR_dest_info file on the SP-Attached server and check the mode 
and ownership of the file. 

5. Verify perfagent: Ensure that perfagent.tools 2.2.32.x are installed in your 
SP-Attached server. 

6. Mount the psspipp directory: Mount the /spdata/sysl/install/pssplpp 
directory on the boot/install server from the SP-Attached server. For 
example, issue: 

# mount sp3en0:/spdata/sysl/install/pssplpp /mnt 
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7. Install ssp.basic and its prerequisites onto the SP-Attached server: 

# installp -aXgd/mnt/PSSP-3 .1 ssp.basic 2>&1 | tee /tup/install, log 

8. Unmount the /spdata/sysl/install/pssplpp directory on the boot/install 
server from the SP-Attached server: 

# umount /rant 

9. Run pssp_script: Run the pssp_script by issuing: 

# /usr/lpp/ssp/install/bin/pssp_script 

10. Reboot: Perform a reboot: 

# shutdown -Fr 


15.6 Adding a switch 

This section was already summarized as part of the previous section. 
However, here we introduce the following two cases when you just add the 
SP Switch only: 

• Adding a switch to a switchless system 

• Adding a switch to a system with existing switches 

15.6.1 Adding a switch to a switchless system 

Perform the following to add a switch to a switchless system: 

1. Redefine the system to a single partition. 

Refer to the RS/6000: Planning Volume 2, GA22-7281, for more 
information. 

2. Install the level of communication subsystem software (ssp.css) on the 
CWS with the command: installp 

3. Install the new switch. 

Your IBM Customer Engineer (CE) performs this step. This step may 
include installing the switch adapters and installing a new frame 
supervisor card. 

4. Create the switch partition class with the following command: 

# Eprimary -init 

5. Check and update the state of the supervisor microcode with the 
command: spsvrmgr 

6. Configure the switch adapters for each node with the spadaptrs command 
to create csso adapter objects in the SDR for each new node. 
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7. Reconfigure the hardware monitor to recognize the new switch. 

To do this, enter: 

# bmcmds -G setid 1:0 

8. Update the System Data Repository (SDR). 

To update the SDR switch information, issue the following command: 

# /usr/lpp/ssp/install/bin/bmreinit 

9. Set up the switch. 

Refer the step 15 in 15.4, “Adding a node” on page 410. 

10. Refresh system partition-sensitive subsystems with the command on the 
CWS after adding the switch: 

# syspar_ctrl -r -G 

11 .Set the nodes to customize with the following command: 

# spbootins -r customize -1 <node_list> 

12.Reboot all the nodes for node customization. 

13.Start up the switch with Estart and verify the switch. 

15.6.2 Adding a switch to a system with existing switches 

Perform the following steps to add a switch to a system with existing switches: 

1. Redefine the system to a single partition. 

2. Install the new switch. 

Your IBM Customer Engineer (CE) performs this step. This step includes 
installing the switch adapters and installing new frame supervisors. 

3. Check and update the state of the supervisor microcode with the 
command: spsvrmgr 

4. Configure the adapters for each node with the spadaptrs command to 
create csso adapter objects in the SDR for each new node. 

5. Set up the switch. 

Refer the step 15 in “Adding a node” on page 410. 

6. Refresh system partition-sensitive subsystems with the following 
command on the CWS after adding the switch: 

# syspar_ctrl -r -G 

7. Set the nodes to customize with the following command: 

# spbootins -r customize -1 <node_list> 
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8. Reboot all the nodes for node customization. 

9. Start up the switch with Estart and verify the switch. 


15.7 Replacing to PCI-based 332 MHz SMP node 

This migration scenario is summarized only for preparing for the exam and 
will not provide full information for conducting an actual migration. However, 
this section will provide enough information to understand the migration 
process. 

15.7.1 Assumptions 

• There is only one partition in the SP system. 

• All nodes being upgraded are required to be installed with a current 
mksysb image. Note that logical names for devices on the new 332 MHz 
SMP node will most likely not be the same as on the legacy node. This is 
because the 332 MHz SMP node will be freshly installed and is a different 
technology. 

• The node we are migrating is not a boot/install server node. 

• HACWS is not implemented. 

• Install AIX Version 4.3.2 and PSSP Version 3.1. 

15.7.2 Software requisites 

Getting the correct software acquired, copied, and installed can be a most 
complex task in any SP installation. Migrating to the 332 MHz SMP node and 
PSSP 2.4 or PSSP 3.1 is no exception. The basic facts surrounding proper 
software prerequisites and installation are: 

• Required base level AIX filesets and all PTFs should be installed on the 
CWS and nodes. 

• Required base level AIX filesets and all PTFs should be copied and 
available in /spdata/sys1/install/aix432/lppsource. 

• Required base level AIX filesets should be built into the appropriate 
SPOT. 

• Required AIX fixes should be used to customize the appropriate SPOT. 

• Required PSSP fixes should be copied into the PSSP directory 
(/spdata/sysl/install/psspipp/PSS P-3.1). 
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15.7.2.1 PSSP code 

A general recommendation is to install all the latest level fixes during a 
migration. This includes both the CWS and the nodes. The fixes will not be 
installed by default even if properly placed in the 

/spdata/sys/install/pssplpp/PSSP-3.1 directory. You must explicitly specify 
Install at latest available level for the CWS and modify the 
/tftpboot/script.cust file to install the fixes on the nodes. 

15.7.2.2 Mirroring considerations 

Nodes with pre-PSSP V3.1 on which the rootvg VG has been mirrored are at 
risk of losing the mirroring on that node if the information regarding the 
mirroring is not entered into the SDR prior to migrating that node to PSSP 
3.1. Failure to update this information in the SDR will result in the VG being 
unmirrored. 

15.7.2.3 Migration and coexistence considerations 

Table 29 shows service that must be applied to your existing SP system prior 
to migrating your CWS to PSSP 3.1. Coexistence also requires this service. 

Table 29. Required service PTF set for migration 


PSSP Level 

PTF Set Required 

PSSP 2.2 

PTF Set 20 

PSSP 2.3 

PTF Set 12 

PSSP 2.4 

PTF Set 5 


15.7.3 Control workstation requirements 

The CWS has a certain minimum memory requirements for PSSP 2.4 and 
PSSP 3.1. This does not take into account other applications that may be 
running on the CWS (not recommended for performance reasons). 

15.7.3.1 AIX software configuration 

The required AIX software level is 4.2.1 or 4.3.1 for PSSP 2.4 and 4.3.2 for 
PSSP 3.1. There are also some required fixes at either level that will need to 
be installed. Refer to the Software Requisite section and the PSSP: 
Installation and Migration Guide, GC23-3898, for documentation of these 
levels. AIX must be at a supported level before PSSP can be installed. 

15.7.3.2 PSSP software configuration 

PSSP 2.4 with PTF set 3 is the minimal required level of PSSP on the CWS in 
order to have 332 MHz SMP Nodes. Please refer to the Software Requisites 
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section in PSSP: Installation and Migration Guide, GC23-3898, for the 
specific levels that are required. 

15.7.3.3 NIM configuration 

The NIM configuration on the CWS will also need to be updated to current 
levels. Please refer to the Software Requisites section for information on the 
Ippsource and SPOT configuration. Note that any additional base operating 
system filesets and related filesets that are installed on the existing nodes 
should be in the Ippsource directory. 

15.7.4 Node migration 

This section summarizes the required steps to replace existing nodes with the 
new 332 MHz SMP nodes. This procedure can be done simultaneously on all 
nodes, or it can be performed over a period of time. The CWS will need to be 
upgraded before any nodes are replaced. The majority of the time will be 
spent in preparation and migration of the CWS and nodes to current levels of 
software and the necessary backups for the nodes being replaced. 

15.7.4.1 Phase I: Preparation on the CWS and existing nodes 

1. Plan any necessary client and server verification testing. 

2. Plan any external device verification (tape libraries, and so on). 

3. Capture all required node documentation. 

4. Capture all non-rootvg VG information. 

5. A script may be written to back up the nodes. An example script is: 

#/usr/bin/ksh 

CWS=cws 

DATE=$(date +%y%m%d) 

NODE=$(hostname -s) 

/usr/sbin/mount cws:/spdata/sysl/install/images /mnt 
/usr/bin/mksysb -i /mnt/bos.obj.${NODE}.${ date} 

/usr/sbin/unmount /mnt 

6. Create a full system backup for each node. Some example commands 
are: 

# exportfs -i -o access=nodel:node3,root=nodel:node3 \ 

/spdata/sysl/instal1/images 

# pep -a /usr/local/bin/backup_nodes.ksh 

# dsh -a /usr/local/bin/backup_nodes.ksh 

7. Create system backup (rootvg) for the control workstation: 

# mksysb -i /dev/rmtO 
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8. Copy required AIX filesets including PCI device filesets to the 
/spdata/sys1/install/aix432/lppsource directory. 

9. Copy required AIX fixes including PCI device fixes to the 
/spdata/sys1/install/aix432/lppsource directory. 

10. Copy PSSP to the/spdata/sysl/install/pssplpp/PSSP-3.1 directory. 

11 .Copy latest PSSP fixes to /spdata/sysl/install/pssplpp/PSSP-3.1 
directory. 

12. Copy coexistence fixes to /spdata/sysl/install/pssplpp/PSSP-3.1 
directory if needed. 

13. Create /spdata volume group backup: 

# savevg -i /dev/rmtO spdatavg 

15.7.4.2 Phase II: Perform on the existing nodes 

1. Perform the preparation steps. 

2. Upgrade AIX as required on the CWS. Do not forget to update the 
SPOT if fixes were added to the Ippsource directory. Perform a 
SDRArchive before backing up the CWS. Take a backup of the CWS 
after this is successfully completed. 

3. Upgrade to the latest level of PSSP and latest fixes. If you plan on 
staying in this state for an extended period of time, you may need to 
install coexistence fixes on the nodes. These fixes allow nodes at 
earlier levels of PSSP to operate with a CWS at the latest level of 
PSSP. Take another backup of the CWS. 

4. Verify operation of the upgraded CWS with the nodes. Perform a 
SDRArchive. 

5. Upgrade PSSP and AIX (if needed) on the nodes that will be replaced 
by 332 MHz SMP nodes. Install the latest PSSP fixes. 

6. Verify operation of the nodes and back up the nodes after successful 
verification. Archive the SDR through the SDRArchive command. 

7. Shut down the original SP nodes that are being replaced. 

8. Remove the node definitions for the nodes being replaced using the 
spdeinode command. This is to remove any of the old nodes from the 
SDR since the new configuration is guaranteed to be different. Now is 
the time to back up the /spdata volume group on the CWS. 

9. Bring in and physically install the new nodes. You will move all external 
node connections from the original nodes to the new nodes. 
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15.7.4.3 Phase III: Rebuild SDR and install new 332 MHz SMP nodes 

1. Rebuild SDR with all required node information on the CWS. 

2. Replace old nodes with new 332 MHz SMP nodes. Be careful to cable 
networks, DASD, and tapes in the proper order (for example, entl on 
the old SP node should be connected to what will be entl on the new 
332 MHz SMP Node). 

3. Netboot all nodes being sure to select the correct AIX & PSSP levels. 

4. Verify AIX and PSSP base code levels on nodes. 

5. Verify AIX and PSSP fix levels on nodes and upgrade if necessary. 

6. Verify node operation (/usr/lpp/ssp/install/bin/node_number, netstat 
-in). 

7. You will need the node documentation acquired during the preparation 
step. 

8. Perform any necessary external device configuration (tape drives, 
external volume groups, and so on). 

9. Perform any necessary client and server verification testing. 

10. Perform any external device verification (tape libraries, and so on). 

11 .Create a full system backup for nodes. 

12. Create a system backup (rootvg) for the CWS. 

13. Create a /spdata volume group backup. 


15.8 Related documentation 

We assume that you already have experience with the key commands and 
files from Chapter 7 and Chapter 8. The following IBM manuals will help you 
with a detailed procedure for reconfiguring your SP system. 

SP Manuals 

To reconfigure your SP system, you should have hands-on experience with 
initial planning and implementation. The manuals, RS/6000 SP: Planning Vol 
1, Hardware and Physical Environment, GA22-7280, and PS/6000: Planning 
Volume 2, GA22-7281, give you a good description of what you need. For 
details about reconfiguration of you SP system, you can refer to Chapter 5 of 
the following two manuals: PSSP: Installation and Migration Guide, 
GC23-3898, and PSSP Installation and Migration Guide, GA22-7347. 


430 


IBM Certification Study Guide RS/6000 SP 



Other Sources 

Migrating to the RS/6000 SP 332 MHz SMP Node, IBM intranet: 

http://dscrs6k.aix.dfw.ibm.com/ 


15.9 Sample questions 

This section provides a series of questions to help aid you in preparation for 
the certification exam. The answers to these questions can be found in 
Appendix A. 

1. In order to change the cssO IP address or hostname, you should: (Select 
more than one step.) 

A. Delete and restore the NIM environment. 

B. Remove the cssO information from the SDR and reload it. 

C. Change the values as required in the SDR and DNS/hosts 
environment. 

D. Customize the nodes. 

2. Your site planning representative has asked if the upgraded frame has 
any additional or modified environmental requirements. Therefore: 

A. The upgraded frame requires increased power. 

B. The upgraded frame has a decreased footprint. 

C. The upgraded frame is taller. 

D. The upgraded frame requires water cooling. 

3. If you set up a boot/install server, and it is acting as a gateway to the 
control workstation, the ipforwarding must be enabled. Which of the 
following commands will you issue to turn it on? 

A. /usr/sbin/no -ip ipforwarding=2 

B. /usr/sbin/no -1 ipforwarding=l 

C. /usr/sbin/no -o ipforwarding=2 

D. /usr/sbin/no -o ipforwarding=l 

4. Which of the following statements is NOT an assumption when replacing a 
node to PCI-Based 332 MHz SMP node? 

A. Install AIX Version 4.3.2 and PSSP Version 3.1. 

B. HACWS is not implemented. 

C. The node we are migrating is not a boot/install server. 

D. There are two partitions in the SP system. 
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5. In order to update the microcode on the frame supervisor of frame 2, 
which of the following commands will you use? 

A. spsvrmgr -G -u 2:1 

B. spsvrmgr -G -u 2:0 

C. spsvrmgr -R -u 2:0 

D. spsvrmgr -R -u 2:1 


15.10 Exercises 

Here are some exercises you may wish to perform: 

1. Familiarize yourself with the steps required to add a frame. 

2. Describe the necessary steps to add an existing S70 to your SP 
environment. 

3. Familiarize yourself with the necessary steps required to add a node. 

4. Explore the steps to add a switch to a switchless system. 

5. Explore the steps to add a switch to a system with existing switches. 

6. Familiarize yourself with the node migration steps necessary to 
upgrade/replace an existing node. 
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Chapter 16. Problem diagnosis 


In this chapter, we discuss common problems related to node installation, SP 
user management, Kerberos, and SP Switches. In most of the sections, we 
start with the checklists, and the recovery for each problem is stated as 
actions rather than detailed procedures. Therefore, we recommend reading 
the related documents for detailed procedures to help you better understand 
each topic in order to resolve real world problems. 


16.1 Key concepts you should study 

This section gives you the key concepts you have to understand when you 
prepare for the certification exam on diagnosing problems of the RS/6000 SP. 
You should understand: 

• The basic SP hardware and software. 

• The basic SP implementation process and techniques to resolve common 
problems. 

• The overview of the setup_server wrapper, including NIM. 

• The network boot process and how to interpret its LED for common 
problems. 

• The mechanism of SP user management with automount and file 
collection and the techniques to resolve common problems. 

• The basic concept of Kerberos, its setup, and the techniques to resolve 
common problems. 

• The basic SP system connectivity and its related problems. 

• The different features on the 604 high node and its problems. 

• The basic SP switch operations and key commands. 

• The basic techniques to resolve common SP switch problems. 


16.2 Diagnosing node installation related problems 

We start with this section by introducing two types of common problems when 
installing the SP nodes: setup_server and network boot problems. 

16.2.1 Diagnosing setup_server problems 

The problems with setup_server are complicated and require reasonable 
understanding of each wrapper. Therefore, it is hard to make simple 
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checklists. However, since the error messages are well indicated in the 
standard output while setup_server is running, you should carefully observe 
the messages and try to understand them in order to solve the problems. The 
probable causes for setup_server failure are usually three types as follows: 

• Kerberos problems 

• SDR problems 

• NIM related Problems 

Kerberos problems in setup server are usually related to the Kerberos ticket. 
Thus, we only discuss the problems with SDR and those that are NIM related. 

Note that the setup_server script should run on the boot/install servers. If you 
have a boot/install server setup other than CWS, run setup_server through the 
spbootins command with -s yes (which is the default) on CWS, then 
setup_server will run on each boot/install server using dsh and return the 
progress message output on the CWS. 

16.2.1.1 Problems with the SDR 

The most common problem with the SDR on setup_server is that the 
information within the SDR is not correct. But, you should also verify the 
/etc/SDR_dest_info file and see if it is pointing to the correct partition IP 
address. Then, check all the information in the SDR with the command 
spistdata with various options. One important class of setup_server is 
Syspar_map. Check this with the command SDRGetObjects Syspar_maptO find the 
problem. 

16.2.1.2 Problems with NIM export 

When setup_server executes, the export_ciients wrapper exports the 
directories that are locations of the resources that the NIM client needs to 
perform the installation. Sometimes NIM cannot configure a NIM client when 
a NIM client definition is not entirely removed from the exported directories it 
manages. Here is an example of the successful export, by the exportfs 
command, of a NIM client, sp3n05, which is ready to be installed: 

# exportfs 

/spdata/sysl/install/pssplpp -ro 
/spdata/sysl/install/pssp/noprompt 
/spdata/sysl/install/pssp/pssp_script 
/spdata/sysl/install/images/bos.obj.min.432 
-ro,root=sp3n05.msc.itso.ibm.com 

/export/nim/scripts/sp3n05.script -ro,root=sp3n01.msc.itso.ibm.com 
/spdata/sysl/install/aix432/lppsource -ro 
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A problem occurs if the NIM client is listed in some of these directories, but 
the resource has not been allocated. This may happen if NIM has not 
successfully removed the NIM client in a previous NIM command. 

To resolve this, you may follow the following procedure: 

1. Check the exported file or directory with the command: 

# exportfs 

/spdata/sysl/install/pssplpp -ro 
/spdata/sysl/instal1/aix4 3 2/lppsource -ro 

/spdata/sysl/install/images/bos.obj.min.432 -ro,root=sp3n05 

2. Un-export a file or directory with the exportfs -u command: 

# exportfs -u /spdata/sysl/install/images/bos.obj.min.432 

3. Verify that the exported directory has been removed from the export list: 

# exportfs 

/spdata/sysl/install/pssplpp -ro 
/spdata/sysl/install/aix432/lppsource -ro 

Once the NFS export has been corrected, you can issue setup_server on 
the NIM master to redefine the NIM client. 

16.2.1.3 Problems with conflicting NIM Cstate and SDR 

Before we discuss this problem, it is helpful to understand NIM client 
definition. Table 30 shows information on this. 

Table 30. NIM client definition information 


boot_response 

Cstate 

Allocations 

install 

BOS installation has 
been enabled. 

spot psspspot 
lpp_source lppsource 
bosinst_data noprompt 
script psspscript 
mksysb mksysb_1 

diag 

Diagnostic boot has 
been enabled. 

spot psspspot 
bosinst_data prompt 

maintenance 

BOS installation has 
been enabled. 

spot psspspot 
bosinst_data prompt 

disk or customize 

Ready for a NIM 
operation. 
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boot_response 

Cstate 

Allocations 

migrate 

BOS installation has 
been enabled. 

spot psspspot 
lpp_source Ippsource 
bosinst_data migrate 
script psspscript 
mksysb mksysb_1 


A NIM client may be in a state that conflicts with your intentions for the node. 
You may intend to install a node, but setup_server returns a message that the 
nim -o bos_inst command failed for this client. When setup_server runs on the 
NIM master to configure this node, it detects that the node is busy installing 
and does not reconfigure it. This can happen for several reasons: 

• During a node NIM mksysb installation, the client node being installed was 
interrupted before the successful completion of the node installation. 

• A node was booted in diagnostics or maintenance mode, and now you 
would like to reinstall it. 

• The node was switched from one boot response to another. 

Each of these occurrences causes the client to be in a state that appears that 
the node is still installing. 

To correct this problem, check with the isnim -1 <ciient_name> command and 
issue the following command for the NIM client: 

# nim -Fo reset <client_name> 

It is recommended that you should always set back to disk when you switch 
boot response from one state to another. 

16.2.1.4 Problems with allocating the SPOT resource 

If you get error messages when you allocate the SPOT resources, follow 
these steps to determine and correct the problem: 

1. Perform a check on the SPOT by issuing: 

# nim -o check spot_aix432 

This check should inform you if there is a problem. 

2. If you are unable to determine the problem with the SPOT, you can update 
the SPOT by issuing: 

# nim -o cust spot_aix432 

3. Deallocate resources allocated to clients with: 

# nim -o deallocate -a spot_aix432 
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4. Finally, remove the SPOT with: 

# nim -Fo remove spot_aix432 

and then run setup_server to re-create the SPOT. 

16.2.1.5 Problems with creation of the mksysb resource 

If setup_server cannot create the mksysb resource, verify that the specified 
mksysb image is in the /spdata/sysl/install/images directory. 

16.2.1.6 Problems with creation of the Ippsource resource 

If setup_server is unable to create the Ippsource resource, verify that the 
minimal required filesets reside in the Ippsource directory: 

# /spdata/sysl/install/aix432/lppsource 

To successfully create the Ippsource resource on a boot/install server, 
setup_server must acquire a lock in the Ippsource directory on the CWS. 
Failure to acquire this lock may mean that the lock was not released properly. 
This lock file contains the hostname of the system that currently has the lock 
and is located in /spdata/sysl/install/lppsource/lppsource.lock. 

Log in to the system specified in the lock file and determine if setup_server is 
currently running. If it is not running, remove the lock file and run setup_server 
again on the system that failed to create the Ippsource resource. 

In another case of NIM allocation failures, you may get the following error 
messages: 

0042-001 nim: processing error encountered on "master": 
rshd: 0826-813 Permission is denied. rc=6. 

0042-006 m_allocate: (From_Masster) rcmd Error 0 

allnimres: 0016-254: Failure to allocate lpp_source resource 

lppsource_defualt 

from server (node_number) (node_name) to client (node_number) 

(node_name) 

(nim -o allocate ; rc=l) 

This failure is caused by incorrect or missing rcmd support on the CWS, in the 
,/rhosts file, for the boot/install server nodes. The ./rhosts file needs to have 
an entry for the boot/install server hostname when trying to execute the 
allnimres command. The setup_server command on the boot/install server 
node should correct this problem. 

16.2.1.7 Problems with creation of the SPOT resource 

If setup_server fails to create the SPOT resource, verify that the following 
resources are available: 
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1. Check if the file systems /, /tftpboot, and /tmp are full by using the 
command: df -k 

2. Check the valid Ippsource resource is available by using the command 

# lsnim -1 Ippsource 
Ippsource: 
class = resources 
type = lpp_source 
server = master 

location = /spdata/sysl/install/Ippsource 
alloc_count = 0 
Rstate = ready for use 
prev_state = unavailable for use 
simages = yes 

The Rstate is ready for use, and the simages is yes. 

If the simages attribute on the Ippsource resource is no then the reguired 
images for the support images needed to create the SPOT were not available 
in the Ippsource resource. 

If you have missing installp images from the Ippsource directory, download 
from the AIX4.3 installation media to /spdata/sys1/install/aix432/lppsource. 
Then, remove the Ippsource with nim -o remove aix432 and run setup_server. 

16.2.2 Diagnosing network boot process problems 

This section describes the common problems on the network boot process. 
We introduce common checklists you need to perform, the summary of the 
network process, and diagnose common LED problems as examples. 

16.2.2.1 Common checklists 

When you have a problem with network booting, you should check the 
following lists: 

• Check whether the cable is connected or not. 

• Monitor the log file with: 

# tail -f /var/adm/SPlogs/spmon/nc/nc.<frame_number>.<slot_number> 

for any error. 

If the nodecond command keeps failing, try to follow the manual node 
conditioning procedure as shown in 9.2.21, “nodecond” on page 282. 

• Check if there is any Kerberos error. 

• Check if there is any SDR error. 
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16.2.2.2 Overview of network boot process 

In order to resolve any network boot related problems, you may need to 
understand the flow of network boot process. Here, we summarize the 
network boot process after you issue the nodecond command. 

• When nodecond exits, the node is in the process of broadcasting a bootp 
request. 

1. LED 231 sends a bootp broadcast packet through the network. 

2. LED 260 reaches the limit for not receiving a reply packet. 

3. Attempts to retrieve the boot image file. 

4. LED 299 received a valid boot image. 

5. Continued to read the boot record from the boot image and create the 
RAM file system. 

6. Invokes: /etc/init(/usr/lib/boot/ssh) 

7. Invokes: /sbin/rc.boot 

• After rc.boot is executed: 

1. Cannot execute bootinfo_<piatform>, hang at LED CIO. 

2. Remove unnecessary files from RAM file system. 

3. Read IPL Control Block. 

4. Cannot determine the type of boot, hang at LED C06. 

5. LED 600 executes cfgmgr -fv. Set IP resolution by/etc/hosts. 

6. LED 606 configures Io0,en0. If error, hang at LED 607. 

7. LED 608 retrieves niminfo (/tftpboot/<reliable_hostname>) file through 
tftp. If error, hang at LED 609. 

8. Create/etc/hosts and Configure IP route. If error, hang at LED 613. 

9. LED 610 performs NFS mount of the SPOT file system. 

If error, hang at LED 611. 

10 . LED 612 executes the rc.bos_inst script. 

11 .Change Mstate attribute of the NIM client object to: in the processing 
of booting 

12 . LED 610 creates local mount point. If error, hang at LED 625. 

Attempt to NFS mount directory. If error, hang at LED 611. 

Clear the information attribute of the NIM client object. 

13 . LED 622 links the configuration methods and executes cfgmgr -vf for 
the first and second phase. 

14. Exit /etc/rc.boot for the first phase and start the second phase. 

15.Set /etc/hosts for IP resolution and reload niminfo file. 

16. Execute rc.bos_inst again. 

17. Delete the rc.boot file. 

18. Define the IP parameters. 


Chapter 16. Problem diagnosis 439 



19. Copy ODM objects for pre-test diagnostics. 

20. Clear the information attribute of the NIM client object. 

21 .Invoke the bi_main script. 

• After the bi_main script is invoked: 

1. Invoke the initialization function and change the NIM Cstate attribute to 

Base Operation System Installation is being performed. 

2. LED C40 retrieves bosinst.data, image.data and preserve.list files and 
creates a file with the description of all the disks. 

3. LED C42 changes the NIM information attribute to 
extract_diskette_data and verify the existence of image.data. 

4. Change the NIM information attribute to setting_consoie and set the 
console from the bosinst.data file. If error, hang at LED C45. 

5. Change the NIM information attribute to initialization. 

6. LED C44 checks for available disks on the system. 

7. LED C46 validates target disk information. 

8. LED C48 executes the BOSMenus process. 

9. LED C46 initializes the log for bi_main script and sets the minimum 
values for LVs and file systems. 

10. Prepare for restoring the operating system. 

11 .LED C54 restores the base operating system. 

12. LED C52 changes the environment from RAM to the image just 
installed. 

13. LED C46 performs miscellaneous post-install procedures. 

14. LED C56 executes BOS installs customization. 

15. LED C46 finishes and reboots the system 

• After pssp_script script is invoked: 

1. u20 creates log directory (enter function create_directories). 

2. u21 establishes working environment (enter function 

setup_environment). 

• u03 gets the node.installjnfo file from the master. 

• u04 expands the node.installjnfo file. 

3. u22 configures the node (enter function configure_node) . 

• u57 gets the node.configjnfo file from the master. 

• u59 gets the cuat.sp template from the master. 

4. u23 Create/update /etc/ssp files (enter function create_fiies). 

• u60 Create/update /etc/ssp files. 

5. u24 updates /etc/hosts file (enter function update_etchosts). 

6. u25 gets configuration files (enter function get_fiies). 

• u61 gets /etc/SDR_dest_info from the boot/install server. 

• u79 gets script.cust from the boot/install server. 
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• u50 gets tuning.cust from the boot/install server. 

• u54 gets spfbcheck from the boot/install server. 

• u56 gets psspfb_script from the boot/install server. 

• u58 gets psspfb_script from the control workstation. 

7. u26 gets authentication files (enters the function authent_stuff ). 

• u67 gets /etc/krb.conf from the boot/install server. 

• u68gets /etc/krb.realms from the boot/install server. 

• u69 gets krb-srvtab from the boot/install server. 

8. u27 updates the /etc/inittab file (enters the function update_etcinittab). 

9. u28 performs MP-specific functions (enters the function upmp_work). 

• u52 Processor is MP. 

• u51 Processor is UP. 

• u55 Fatal error in bosboot. 

10 . u29 installs prerequisite filesets (enters the function instaiij>rereqs). 

11 ,u30 installs ssp.clients (enters the function instaii_ssp_ciients). 

• u80 mounts Ippsource and installs ssp.clients. 

12 , u31 installs ssp.basic (enters the function instaii_ssp_basic). 

• u81 installs ssp.basic. 

13 , u32 installs ssp.ha (enters the function instaii_ssp_ha). 

• u53 installs ssp.ha. 

14 , u33 installs ssp.syscti (enters the function instaii_ssp_syscti). 

• u82 installs ssp.syscti. 

15 , u34 installs ssp.pman (enters the function install_ssp_joman) . 

• u41 configures switch (enters the function config_switch). 

16 , u35 installs ssp.css (enters the function instaii_ssp_css). 

• u84 installs ssp.css. 

1 7, u36 installs ssp. jm (enters the function instaii_ssp_jm). 

• u85 installs ssp.jm. 

18, u37deletes the master .rhosts entry (enters the function 

delete_master_rhosts). 

19, u38 creates a new dump logical volume (enters the function 

create_dunp_lv). 

• u86 creates a new dump logical volume. 

20 , u39 runs the customer's tuning.cust (enters the function 

run_tuning_cust). 

21 ,u40 runs the customer's script.cust (enters the function 

run_script_cust). 

• u87 runs the customer's script.cust script file. 

• u42 runs the psspfb_script (enters the function run_j?sspfb_script). 


16.2.2.3 Problem with 231 LED 

When the node broadcasts a bootp request, it locates the remote boot image, 
and it is held in /etc/bootptab, which contains the IP addresses and the 
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location of the boot image. The boot image in /tftpboot is simply a link to the 
correct type of boot image for the node. This is LED231. The following 
message is found in the AIX V4.3 Messages Guide and Reference, 

SC23-4129: 

Display Value 231 
Explanation 

Progress indicator. Attempting a Normal-mode system restart from 
Ethernet specified by selection from ROM menus. 

System Action 
The system retries. 

User Action 

If the system halts with this value displayed, record SRN 101-231 in 
item 4 on the Problem Summary Form.Report the problem to your hardware 
service organization, and then stop. You have completed these 
procedures. 

To resolve this, try the following: 

1. Try the manual node conditioning procedure and test network 
connectivity 

2. Check the /etc/inetd.conf and look for bootps. 

3. Check the /etc/bootptab file for an entry of the problem node. Note that 
in multiple frame configurations if you do not define the boot/install 
server in the Volume_Group class, it defaults to the first node in that 
frame. 

4. Check for the boot/install server with the spistdata -b command. 

5. Rerun the spbootins command with setup_server. 

16.2.2.4 Problem with 611 LED 

At this stage of the netboot process, all the files and directories are NFS 
mounted in order to perform the installation, migration, or customization. The 
following message is found in the AIX V4.3 Messages Guide and Reference, 
SC23-4129: 

Display Value 611 
Explanation 

Remote mount of the NFS file system failed. 

User Action 

Verify that the server is correctly exporting the client file systems. 
Verify that the client.info file contains valid entries for exported 
file systems and server. 

To resolve this problem, try: 
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1. Check, with the following command, if the NIM client machine has the 
exported directories listed: 

# lsnim -FI <client> | grep exported 

2. Compare with the output of the exportfs command. 

3. Verify that the directory 

/spdata/sys1/install/<aix_version>/spot/spot_<aix_version>/usr/sys/ 
inst.images is not a linked directory. 

4. Check, with the following command, if the image file is linked to the correct 
boot image file: 

# Is -1 /tftpboot/sp3n06.msc.itso.ibm.com 

5. If you can not find the cause of the problem, clean up the NIM setup and 
exported directory and do as follows: 

1. Remove entries from /etc/exports with: 

/export/nim/scripts/* 

/spdata/* 

2. Remove NFS-related files in /etc: 

# rm /etc/state 

# rm /etc/sm/* /etc/sm.bak/* 

3. Unconfigure and reconfigure NIM: 

# nim -o unconfig master 

# installp -u bos.sysmgt.nim.master 

4. Set the node or nodes back to install and run setup_server. This will 
also reinstall NIM: 

# spbootins -r install -1 <node#> 

5. Refresh the newly created exports list: 

# exportfs -ua 

# exportfs -a 

6. Refresh NFS: 

# stopsrc -g nfs 

# stopsrc -g portmap 

# startsrc -g portmap 

# startsrc -g nfs 

16.2.2.5 Problems with C45 LED 

When you install the node, sometimes installation hangs at LED C45. The 
following message is found in AIX V4.3 Messages Guide and Reference, 
SC23-4129: 
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Explanation 

Cannot configure the console. 

System Action 

The cfgcon command has failed. 

User Action 

Ensure that the media is readable, that the display type is supported, 
and that the media contains device support for the display type. 

If this happens, try the following: 

1. Verify which fileset contains the cfgcon command by entering: 

# lslpp -w | grep cfgcon 

which returns: 

/usr/lib/methods/cfgcon bos.rte.console File 

2. With the following command, verify if this fileset is in the SPOT: 

# nim -o lslpp -a filesets=bos.rte.console spot_aix432 

3. Check if any device fileset is missing from SPOT. 

4. If there is, install an additional fileset on the SPOT and re-create the boot 
image files. 

16.2.2.6 Problems with C48 LED 

When you migrate a node, the process hang at LED C48. The following 
message is found in AIX V4.3 Messages Guide and Reference, SC23-4129: 

Display Value c48 
Explanation 

Prompting you for input. 

System Action 
BosMenus is being run. 

User Action 

If this LED persists, you must provide responses at the console. 

To resolve the problem: 

1. With the following command, check NIM information: 

# lsnim -1 <node_name> 

2. Open tty: 

# slterm -w frame_number node_number 

3. If the node cannot read the image.data file, do as follows: 

1. Check if the bos fileset exists in Ippsource: 

# nim -o lslpp -a filesets=bos lppsource_aix432 
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2. Check if the image.data file exists: 

# dd if=/spdata/sysl/install/aix432/lppsource/bos bs=lk count=128 
] restore -Tvqf ./image.data 

3. Then, check the file permission on image.data. 

16.2.2.7 Problems with node installation from mksysb 

When you have a problem installing from a mksysb image from its boot/install 
server: 

• Verify that the boot/install server is available: 

1. Check with the clients' boot/install server and its hostname by issuing: 

# splstdata -b 

2. telnet to the boot/install server if not the CWS. 

3. Look at the /etc/bootptab to make sure the node you are installing is 
listed in this file. If the node is not listed in this file, you should follow 
the NIM debugging procedure shown on page 171 of the PSSP 
Diagnosis Guide , GA22-7350. 

4. If the node is listed in this file, continue to the next step. 

• Open a write console to check for console messages. 

1. At the control workstation, open a write console for the node with the 
install problem by issuing: 

# spmon -o node<node_number> 

or 

# slterm -w frame_number node_number 

2. Check any error message from the console that might help determine 
the cause of the problem. Also, look for NIM messages that might 
suggest that the installation is proceeding. An example of a NIM 
progress message is: 

/ step_number of total_steps complete 

which tells how many installation steps have completed. This message 
is accompanied by an LED code of u54. 

• Check to see if the image is available and the permissions are appropriate 
by issuing: 

# /usr/lpp/ssp/bin/splstdata -b 

The next_instaii_image field lists the name of the image to be installed. 
If the field for this node is set to default, the default image specified by 
the instaii_image attribute of the SP object will be installed. The 
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images are found in the /spdata/sysl/install/images directory. You can 
check the images and their permissions by issuing: 

# Is -1 /spdata/sysl/install/images 

This should return: 

total 857840 

-rw-r--r— 1 root sys 130083840 Jan 14 11:15 bos .obj . ssp. 4.3 

The important things to check are that the images directory has 
execute (x) permissions by all, and that the image is readable (r) by all. 

The setup_server script tries to clean up obsolete images on install 
servers. If it finds an image in the /spdata/sysl/install/images directory 
that is not needed by an install client, it deletes the image. However, 
setup_server deletes images on the control workstation only if the site 
environment variable REMOVEJMAGES is true. 

• Review the NIM configuration and perform NIM diagnostics for this Node. 


16.3 Diagnosing SDR problems 

This section shows a few common problems related to SDR and its recovery 
actions. 

16.3.1 Problems with connection to server 

Sometimes, when you change system or network and issue SDR command, 
such as spistdata -bon the node, you get the error message: failing to 
connect to server. If so, try the following: 

1. Type spget_syspar on the node showing the failing SDR commands. 

2. If the spget_syspar command fails, check the /etc/SDR_dest_info file on 
the same node. It should have at least two records in it. These records are 
the primary and the default records. They should look like: 

# cat SDR_dest_info 
default:192.168.3.13 0 
primary:192.168.3.130 
nameofdefault:sp3en0 
nameofprimary:sp3en0 

If this file is missing or does not have these two records, the node may not 
be properly installed, or the file has been altered or corrupted. You can 
edit the file that contains the two records above or copy the file from a 
working node in the same system partition. 
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3. If the spget_syspar command is successful, check to make sure that the 
address is also the address of a valid system partition. If it is, try to ping 
that address. If the ping fails, contact your system administrator to 
investigate a network problem. 

4. If the value returned by the spget_syspar command is not the same as the 
address in the primary record of the /etc/SDR_dest_info file, the 
SP_NAME environment variable is directing SDR requests to a different 
address. Make sure that this address is a valid system partition. 

5. If the value of the SP_NAME environment variable is a hostname, try 
setting it to the equivalent dotted decimal IP address. 

6. Check for the existence of the SDR server process (sdrd) on the CWS 
with: 

# ps -ef | grep sdrd 

If the process is not running, do the following: 

• Check the sdrd entry in the file /etc/inittab on the control workstation. It 
should read: 

sdrd:2:once:/usr/bin/startsrc -g sdr 

• Check the SDR server logs in /var/adm/SPIogs/sdr/sdrdlog. 

<server_ip>.pid, where pid is a process ID. 

• Issue /usr/bin/startsrc -g sdr to start the SDR daemon. 

16.3.2 Problem with class corrupted or non-existent 

If an SDR command ends with rc=io2 (internal data format inconsistency) or 
026 (class does not exist), first make sure the class name is spelled correctly 
and the case is correct (see the table of classes and attributes in “The 
System Data Repository” appendix in PSSP: Administration Guide, 
SA22-7348. Then, follow the steps in “SDR Shadow Files” in the System Data 
Repository appendix in PSSP: Administration Guide, SA22-7348. 

Then, check if the /var file system is full. If this is the case, either define more 
space for /var or remove unnecessary files. 


16.4 Diagnosing user access related problems 

As you have seen from the previous chapter, AMD is changed to AIX 
automount starting with PSSP 2.3. Thus, we briefly discuss general AMD 
checklists (for PSSP 2.2 or earlier) and extend the discussion to user access 
and AIX Automount problems. 
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16.4.1 Problems with AMD 


• Check if the AMD daemon is running. If not, restart it with: 

/etc/amd/amd_start 

• Make sure that the user’s home directories are exported. If not, update 
/etc/exports and run the exportfs -a command. 

• Check the /etc/amd/amd-maps/amd.u AMD map file for the existence of 
an user ID if you have problems with logging on to the system. An entry 
should look like this: 

netinst type:=link;fs:=/home 


efri host==sp3en0;type:=link;fs:=/home/sp3en0 \ 

host!=sp3en0;type:=nfs;rhost:=sp3en0;rfs:=/home/sp3en0 

• If there is no entry for the user ID you would like to use, add it to this file. 
Make sure that the updates are distributed after the change by issuing: 

# dsh -w <nodelist> supper update user.admin sup.admin power_system 

Check whether the network connection is still working. 

• Get the information about the AMD mounts by issuing the /etc/amd/amq 
command. If the output of amq looks as follows: 

amq: localhost: RPC: Program not registered 

your problem could be: 

• The AMD daemon is not running. 

• The portmap daemon is not running. 

• The AMD daemon is waiting for a response from the NFS server that is 
not responding. 

Make sure that the portmap daemon is running and that your NFS server 
is responding. If the portmap daemon is inoperative, start it with the 

startsrc -s portmap Command. 

If you have an NFS server problem, check the amd.log file located in the 
/var/adm/SPIogs/amd directory. 

Stop AMD by issuing kin -15 <amdj?id>, solve your NFS problems, and 
Start AMD again with /etc/amd/amd_start. 

• If you have user access problems, do the following: 

• Verify that the login and riogin options for your user are set to true. 

• Check the user path or .rhosts on the node. If you have problems 
executing rshto the node, check the user path to see if the user is 
supposed to be a Kerberos principal. 


448 


IBM Certification Study Guide RS/6000 SP 




• If you have problems executing an SP user administrative command, you 
may get an error message similar to the following: 

0027-153 The user administration function is already in use. 

In this case, the most probable cause is that another user administrative 
command is running, and there is a lock in effect for the command to let it 
finish. If no other administrative command is running, check the 
/usr/lpp/ssp/config/admin directory for the existence of a .userlock file. If 
there is one, remove it and try to execute your command again. 

16.4.2 Problems with user access or automount 

This section shows a few examples about the problems logging into SP 
system or accessing user’s home directories. 

16.4.2.1 Problems with logging in an SP node by a user 

Check the /etc/security/passwd file. If a user is having problems logging in to 
nodes in the SP System, check the login and riogin attributes for the user in 
the /etc/security/passwd file on the SP node. 

Check the Login Control facility to see whether the user's access to the node 
has been blocked. The system administrator should verify that the user is 
allowed access. The system administrator may have blocked interactive 
access so that parallel jobs could run on a node. 

16.4.2.2 Problems with accessing user’s directories 

When you have a problem accessing a user’s directory, verify that the 
automount daemon is running. 

To check whether the automount daemon is running or not, issue: 

# ps -ef | grep automount 

for AIX 4.3.0 or earlier systems, and 

# lssrc -g autofs 

for AIX 4.3.1 or later systems. 
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- Note - 

On AIX 4.3.1 and later systems, the AutoFS function replaces the 
automount function of AIX 4.3.0 and earlier systems. All automount 
functions are compatible with AutoFS. With AutoFS, file systems are 
mounted directly to the target directory instead of using an intermediate 
mount point and symbolic links 


If automount is not running, check with the mount command to see if any 
automount points are still in use. If you see an entry similar to the following 
one, there is still an active automount mount point. For AIX 4.3.0 or earlier 
systems: 

# mount 

sp3n05.msc.itso.ibm.com (pid23450@/u) /u afs Dec 07 15:41 
ro,noacl,ignore 

For AIX 4.3.1 and later systems: 

# mount 

/etc/auto/maps/auto.u /u autofs Dec 07 11:16 ignore 

If the mount command does not show any active mounts for automount, issue 
the following command to start the automounter: 

# /etc/auto/startauto 

If this command succeeds, issue the previous ps or issrc command again to 
verify that the automount daemon is actually running. If so, verify that the 
user directories can be accessed or not. 

Note that the automount daemon should be started automatically during boot. 
Check to see if your SP system is configured for automounter support by 
issuing: 

# splsdata -e | grep amd_config 

If the result is true, you have automounter support configured for the SP in 
your Site Environment options. 

If the startauto command was successful, but the automount daemon is still 
not running, check to see if the SP automounter function has been replaced 
by issuing: 

# Is -1 /etc/auto/*.cust 

If the result of this command contains an entry similar to: 
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-rwx 


1 root system 0 Dec 12 13:20 startauto.oust 


the SP function to start the automounter has been replaced. View this file to 
determine which automounter was started and follow local procedures for 
diagnosing problems for that automounter. 

If the result of the is command does not show any executable user 
customization script, check both the automounter log file 
/var/adm/SPIogs/auto/auto.log and the daemon log file 
/var/adm/SPIogs/SPdaemon.log for error messages. 

If the startauto command fails, find the reported error messages in the PSSP: 
Messages Reference, GA22-7352, and follow the recommended actions. 
Check the automounter log file /var/adm/SPIogs/auto/auto.log for additional 
messages. Also, check the daemon log file /var/adm/SPIogs/SPdaemon.log 
for messages that may have been written by the automounter daemon itself. 

If automounter is running, but the user cannot access user files, the problem 
may be that automount is waiting for a response from an NFS server that is 
not responding or that there is a problem with a map file. Check the 
/var/adm/SPIogs/SPdaemon.log for information relating to NFS servers not 
responding. 

If the problem does not appear to be related to an NFS failure, you will need 
to check your automount maps. Look at the /etc/auto/maps/auto.u map file to 
see if an entry for the user exists in this file. 

Another possible problem is that the server is exporting the file system to an 
interface that is not the interface from which the client is requesting the 
mount. This problem can be found by attempting to mount the file system 
manually on the system where the failure is occurring. 

Stopping and restarting automount 

If you have determined that you need to stop and restart the automount 
daemon, the cleanest and safest way is to reboot the system. However, if you 
cannot reboot the system, use the following steps: 

For AIX 4.3.0 or earlier systems: 

1. Determine whether any users are already working in directories 
mounted by the automount daemon. Issue: 

# mount 

2. Stop the automount daemon: 

# kill -15 process_id 
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where processed is the process number listed by the previous mount 
command. 

- Note - 

It is important that you DO NOT stop the daemon with the kin -kin or 
kin -9. This will prevent the automount daemon from cleaning up its 
mounts and releasing its hold on the file systems. It may cause file system 
hangs and force you to reboot your system to recover those file systems 

3. Start the automount daemon: 

# /etc/auto/startauto 

You can verify that the daemon is running by issuing the previous mount 
or ps commands. 

For AIX 4.3.1 or later systems: 

1. Determine whether any users are already working on the directories 
mounted by the autmountd daemon with the command: mount 

2. Stop the automountd daemon with this command: 

# stopsrc -g autofs 

3. Restart the autmounter: 

# /etc/auto/startauto 

You can verify that the daemon is running by issuing the previous issrc 
command. 


16.5 Diagnosing file collection problems 

In this section, we summerize common checklists for file collection problems 
and explain how you can resolve them. 

16.5.1 Common checklists 

The following check lists give you an idea of what to do when you get error 
messages related to the file collection problems: 

• Check the TCP/IP configuration because file collection uses the Ethernet 
network (enO). Check the enO adapter status or routes if you have 
boot/install server exists and test it with the ping command from client to 
server. Also, check the hostname resolution with nsiookup if DNS is setup. 

• Check if the file collection is resident or not by issuing the supper status 
command. The output from the command looks like: 
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# /var/sysman/supper status 


CollectionResident Access Point Filesystem Size 


node.root Yes 

/ 

- 

- 

power system Yes 

/share/power/system - 



sup.admin Yes 

/var/sysman 

- 

- 

user.admin Yes 

/ 




If the update of the file collection failed, and this file collection is not 
resident on the node, install it by issuing the command: 

# supper install <file collection> 

Check if the file collection server daemon is running on the CWS and 
boot/install server: 

On the CWS: 

[sp3en0:/]# ps -ef | grep sup 

root 10502 5422 0 Dec 03 - 0:00 

/var/sysman/etc/supfilesrv -p /var/sysman/sup/supfilesrv.pid 

# dsh -w sp3n01 ps -ef | grep sup 

Sp3n01: root 6640 10066 0 10:44:21 - 0:00 

/var/sysman/etc/supfilesrv -p /var/sysman/sup/supfilesrv.pid 

Use dsh /var/sysman/supper whereon the CWS to see which machine is 
each node’s supper server as follows: 

[sp3en0:/]# dsh -w sp3n01,sp3n05 /var/sysman/supper where 
sp3n01: supper: Collection node.root would be updated from server 
sp3en0.msc.itso.ibm.com. 

sp3n01: supper: Collection power_system would be updated from server 
sp3en0.msc.itso.ibm.com. 

sp3n01: supper: Collection sup.admin would be updated from server 
sp3en0.msc.itso.ibm.com. 

sp3n01: supper: Collection user.admin would be updated from server 
sp3en0.msc.itso.ibm.com. 

sp3n05: supper: Collection node.root would be updated from server 
sp3n01enl.msc.itso.ibm.com. 

sp3n05: supper: Collection power_system would be updated from server 
sp3n01enl.msc.itso.ibm.com. 

sp3n05: supper: Collection sup.admin would be updated from server 
sp3n01enl.msc.itso.ibm.com. 

sp3n05: supper: Collection user.admin would be updated from server 
sp3n01enl.msc.itso.ibm.com. 
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• Check the server has the supman user ID created. 

• Check the /etc/services file on the server machine as follows: 

[sp3en0:]# grep sup /etc/services 
supdup 95/tcp 

supfilesrv 8431/tcp 

• Check whether the supfilesrv daemon is defined and that it has a correct 
port. 

• Check the log files located in the /var/sysman/logs directory. 

• Check the log files located in the /var/adm/SPIogs/filec directory. 


16.6 Diagnosing Kerberos problems 

In this section, we summarize the common checklist of Kerberos problems. 
Then, we describe possible causes and the action needed to be taken to 
resolve them. In addition, we briefly describe the difference between PSSP 
v2 and PSSP v3. 

16.6.1 Common checklists 

Before we start the Kerberos problem determination, we recommand 
checking the following list: 

• Check that the hostname resolution is OK or not whether you are using 
DNS or the local host file. Remember the encrypted Kerberos service key 
is created with hostname. 

• Check your Kerberos ticket by issuing the klist or k4iist command. If a 
ticket is expired, destroy it with the kdestroyor k4destroy command and 
reissue it with the command kinit or k4init as follows: 

# k4init root.admin 

Then, type the Kerberos password twice. 

• Check the/.klogin file. 

• Check the PATH variable whether Kerberos commands are in the 
environment PATH. 

• Check your file systems by using the df -k command. Remember that /var 
contains a Kerberos database and /tmp contains a ticket. 

• Check the date on the authentication server and clients. (Kerberos can 
handle only a five minute difference.) 

• Check if the Kerberos daemons are running on the control workstation. 
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• Check /etc/krb.realms on the client nodes. 

• Check if you have to recreate /etc/krb-srvtab on the node. 

• Check /etc/krb-srvtab on the authentication server. 

16.6.2 Problems with a user’s principal identity 

An example of a bad Kerberos name format generates the following error 
message: 

sp3en0 # k4init 

Kerbero Initialization 

Kerberos name: root.admin 

k41ist: 2502-003 Bad Kerberos name format 

The probable causes are a bad Kerberos name format, a Kerberos principal 
does not exist, an incorrect Kerberos password, or a corrupted Kerberos 
database. Recovery action is to repeat the command with the correct syntax. 
An example is: 

# k4init root.admin 

Another example is a missing root.admin principal in the /.klogin file on the 
control workstation as follows: 

sp3n05 # dsh -w sp3en0 date 

sp3en0:krshd:Kerberos Authentication Failed:User 

root.admin@MSC.ITSO.IBM.COM is not authorized to login to account root. 
sp3en0: spk4rsh: 0041-004 Kerberos rcmd failed: rcmd protocol failure. 

Check the /.klogin file if it has entry for the user principal. If all the information 
is correct, but the Kerberos command fails, suspect a database corruption. 

16.6.3 Problems with a service’s principal identity 

When a /etc/krb-srvtab file is corrupted on an node, and the remote command 
service (rcmd) fails to work from the control workstation, we have the following 
error message: 

sp3en0 # dsh -w sp3n05 date 

sp3n05:krshd:Kerberos Authentication Failed. 

sp3n05: spk4rsh: 0041-004 Kerberos rcmd failed: rcmd protocol failure. 

The probable causes for this problem are the krb-srvtab file does not exist on 
the node or on the control workstation, the krb-srvtab has the wrong key 
version, or krb-srvtab file is corrupted. Analyze the error messages to confirm 
the service’s principal identity problem. Make sure the /.klogin file, 
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/etc/krb.realms, and /etc/krb-conf files are consistent with those of the 
Kerberos authentication server. 

16.6.4 Problems with authenticated services 

When hardmon is having problems due to a Kerberos error, we have the 
following message: 

sp3en0 # spmon -d 

Opening connection to server 

0026-706 Cannot obtain service ticket for hardmon.sp3en0 
Kerberos error code is 8, Kerberos error message is: 

2504-008 Kerberos principal unknown 

The probable causes are that the ticket has expired, a valid ticket does not 
exists, the host name resolution is not correct, or the ACL files do not have 
correct entries. Destroy the ticket using k4destroy and issue a new ticket by 
issuing k4init root.admin if the user is root. Then, check the hostname 
resolution, ACL files, and the Kerberos database. 

16.6.5 Problems with Kerberos database corruption 

The database can be corrupted for many reasons, and messages also vary 
based on the nature of the corruption. Here, we provide an example of 
messages received because of Kerberos database corruption: 

sp3en0 # k4init root.admin 

Kerberos Initialization for "root.admin" 

k4init: 2504-010 Kerberos principal has null key 

Rebuild the Kerberos database as follows: 

1. Ensure the following directories are included in your PATH: 

• /usr/lpp/ssp/kerberos/etc 

• /usr/lpp/ssp/kerberos/bin 

• /usr/lpp/ssp/bin 

2. On the CWS, login as root and execute the following commands: 

# /usr/lpp/ssp/kerberos/bin/kdestroy 

The kdestroy command destroys the user's authentication tickets that are 
located in /tmp/tkt$uid. 

3. Destroy the Kerberos authentication database, which is located in 
/var/kerberos/*: 

# /usr/lpp/ssp/kerberos/etc/kdb_destroy 

4. Remove the following files: 
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• krb-srvtab: Contains the keys for services on the nodes 

• krb.conf: Contains the SP authentication configuration 

• krb.realms: Specifies the translations from host names to 
authentication realms: 

# rm /etc/krb* 

5. Remove the .klogin file that contains a list of principals that are authorized 
to invoke processes as the root user with the SP authenticated remote 
commands rsh, rep: 

# rm /.klogin 

6. Remove the Kerberos Master key cache file: 

# rm /.k 

7. Insure that the authentication database files are completely removed: 

# rm /var/kerberos/database/* 

8. Change the /etc/inittab entries for Kerberos: 

# chitab "kadm:2:off:/usr/lpp/ssp/kerberos/etc/kadmind -n" 

# chitab "kerb:2:off:/usr/lpp/ssp/kerberos/etc/kerberos" 

9. Refresh the /etc/inittab file: 

# telinit q 

10.Stop the daemons: 

# stopsrc -s hardmon 

# stopsrc -s splogd 

11 .Configure SP authentication services: 

# /usr/lpp/ssp/bin/setup_authent 

This command will add the necessary remote command (RCMD) 
principals for the nodes to the Kerberos database based on what is 
defined in the SDR for those nodes. 

12.Set the node's bootp response to customize and run setup_server: 

# sbootins -r customize -1 <nodelist> 

13.Reboot the nodes. 

After the node reboots, verify that the bootp response toggled back to 
disk. 

14.Start the hardmon and spiogdon CWS: 

# startsrc -s hardmon 

# startsrc -s splogd 
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After step 12 and step 13 are done, the /etc/krb-srvtab files are distributed 

onto the nodes. However, if you cannot reboot the system, do as follows: 

1. After running the command: 

# spbootins -r customize -1 <nodelist> 

2. On the CWS, change the directory to the /tftpboot and verify that there is a 
<node_name>-new-srvtab file for each node 

3. FTP in binary mode to each node's respective 
/tftpboot/<node-name>-new-srvtab file from the CWS to the nodes and 
rename the file to /etc/krb-srvtab. 

4. Set the nodes back to disk on the CWS: 

# spbootins -r disk -1 <nodelist> 


16.6.6 Problems with decoding authenticator 

When you change the host name and do not follow the procedure correctly, 
sometimes /etc/krb-srvtab file produces an error, and you may see the 
following message: 

kshd:0041-005 kerberos rsh or rep failed: 

2504-031 Kerberos error: can't decode authenticator 

Re-create the /etc/krb-srvtab file from the boot/install server, and propagate it 
to the node. If you can reboot the node, simply set boot_response to 
customize, and reboot the node. Otherwise, do as follows: 

On the control workstation, run spbootins by setting boot_response to: 

customize 

# spbootins -r customize -1 <node_list> 

Then, on the control workstation, change the directory to /tftpboot and verify 
the <node_name>-new-srvtab file. FTP this file to the node’s /etc, and 
rename the file to krb-srvtab. Then set the node back to disk as follows: 

# spbootins -r disk -1 <node_list> 


16.6.7 Problems with the Kerberos daemon 

Here is an example of messages when the Kerberos daemons are inactive 
because of missing krb.realms files on the control workstation. This message 
is an excerpt of admin_server.syslog file: 

03-Dec-98 17:47:52 Shutting down admin server 
03-Dec-98 17:48:15 kadmind: 

2503-001 Could not get local realm. 
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Check all the Kerberos file exists on the authentication server that is usually 
the control workstation. Check the contents of the file to make sure the files 
are not corrupted. Check the log /var/adm/SPIogs/kerberos for messages 
related to Kerberos daemons. 


16.7 Diagnosing system connectivity problems 

This section shows a few examples related to network problems. 

16.7.1 Problems with network commands 

If you can not access the node using rsh, telnet, riogin, or ping, you can 
access the node using the tty. This can be done by using the Hardware 
Perspectives, selecting the node, and performing an open tty action on it. It 
can also be done by issuing the slterm -w frame number slot number 
command, where frame number is the frame number of the node, and the slot 
number is the slot number of the node. 

Using either method, you can login to the node and check the hostname, 
network interfaces, network routes, and hostname resolution to determine 
why the node is not responding. 

16.7.2 Problems with accessing the node 

If you can not access the node using telnet or riogin, but can access the 
node using ping, then this is a probable software error. Initiate a dump, record 
all relevant information, and contact the IBM Support Center. 

16.7.3 Topology-related problems 

If the ping and telnet commands are successful, but hostresponds still shows 
the node not responding, there may be something wrong with the Topology 
Services (hats) subsystem. Perform these steps: 

1. Examine the enO (Ethernet adapter) and cssO (switch adapter) addresses 
on all nodes to see if they match the addresses in /var/ha/run/hats. 
partition_name/machines.lst. 

2. Verify that the netmask and broadcast addresses are consistent across all 
nodes. Use the ifconfig eno and ifconfig csso commands. 

3. Check the hats log file on the failing node with the command: 

# cd /var/ha/lcg 

# Is -It | grep hats 

-rw-rw-rw- 1 root system 31474 Dec 07 09:26 
hats.04.104612.sp3en0 
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-rwxr-xr-x 1 root system 40 Dec 04 10:46 hats.sp3en0 

-rw-rw-rw- 1 root system 12713 Dec 04 10:36 

hats.04.103622.sp3en0 

-rw-rw-rw- 1 root system 319749 Dec 04 10:36 

hats.03.141426.sp3en0 

-rw-rw-rw- 1 root system 580300 Dec 04 03:13 

hats.03.141426.sp3en0.bak 

4. Check the hats log file for the Group Leader node. Group Leader nodes 
are those that host the adapter whose address is listed below the line 
Group id in the output of the issrc -is hats command. 

5. Delete and add the hats subsystem with the following command on the 
CWS: 

# syspar_ctrl -c hats.sp3en0 

Then: 

# syspar_ctrl -A hats.sp3en0 

or, on the nodes: 

# syspar_ctrl -c hats 

Then: 

# syspar_ctrl -A hats 

16.8 Diagnosing 604 high node problems 

This section provides information on: 

• 604 high node characteristics, including: 

• Addressing power and fan failures in the 604 high node 

• Rebooting the 604 high node after a system failure 

• Error conditions and performance considerations 

• Using SystemGuard and BUMP programs 

16.8.1 604 high node characteristics 

The 604 high node operation is different from other nodes in several areas: 

• A power feature is available that adds a redundant internal power 
supply to the node. In this configuration, the node will continue to run in 
the event of a power supply failure. Error notification for a power supply 
failure is done through the AIX Error Log on the node. 
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• The cooling system on the node also has redundancy. In the event that 
one of the cooling fans fails, the node will continue to run. Error 
notification for a power supply failure is done through the AIX Error Log 
on the node. 

• If a hardware related crash occurs on the node, SystemGuard will 
re-IPL the node using the long IPL option. During long IPL, some CPU 
or memory resources may be deconfigured by SystemGuard to allow 
the re-IPL to continue. 

16.8.2 Error conditions and performance considerations 

You need to be aware of the following conditions that pertain to the unique 
operation of this node: 

• An error notification object should be set up on the node for the label 
EPOW_SUS. The EPOW_SUS label is used on AIX Error Log entries 
that may pertain to the loss of redundant power supplies or fans. 

• If the node is experiencing performance degradation, you should use 
the iscfg command to verify that none of the CPU resources have been 
deconfigured by SystemGuard if it may have re-IPLed the node using 
the long IPL option. 

16.8.3 Using SystemGuard and BUMP programs 

SystemGuard is a collection of firmware programs that run on the bringup 
microprocessor (BUMP). SystemGuard and BUMP provide service processor 
capability. They enable the operator to manage power supplies, check 
system hardware status, update various configuration parameters, 
investigate problems, and perform tests. 

The BUMP controls the system when the power is off or the AIX operating 
system is stopped. The BUMP releases control of the system to AIX after it is 
loaded. If AIX stops or is shut down, the BUMP again controls the system. 

To activate SystemGuard, the key mode switch must be in the SERVICE 
position during the standby or initialization phases. The standby phase is any 
time the system power is off. The initialization phase is the time when the 
system is being initialized. The PSSP software utilizes SystemGuard IPL 
flags, such as the FAST IPL default, when the netboot process starts. 

16.8.4 Problems with physical power-off 

If the 604 high node was physically powered off from the front panel power 
switch and not powered back on using the front panel switch, try as follow: 
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1. Using spmon, set the key to service mode. 

2. Open a tty console with spmon -o node<node_number>, 

3. Type at the prompt > sbb 

4. On the BUMP processor menu, choose option 5: 

STAND-BY MENU : rev 17.03 
0 Display Configuration 

1 Set Flags 

2 Set Unit Number 

3 Set Configuration 

4 SSbus Maintenance 

5 I2C Maintenance 
Select(x:exit): 5 

5. Select option 08 (I2C Maintenance): 

I2C Maintenance 
00 rd OP status 
01 rd UNIT status 
02 rd EEPROM 
03 margins 
04 on/off OP LEDs 
Select(x:exit): 08 

6. Select option 02 and option 0: 

powering 
00 broadcast ON 
01 broadcast OFF 
02 unit ON 

03 unit OFF 

Select(x:exit): 02 
Unit (0-7): 0 

7. At this point, the power LED should indicate on (does not blink), but the 
node will not power up. 

8. Physically click the power switch (off and then on) on the node. The 
node should now boot in SERVICE mode. 

9. After the node boots successfully, using the spmon -k normal 
<node_number> to set the node key position to NORMAL on CWS, power 
off the node logically (not physically), and then power the node on. 


05 wr LCD 
06 rd i/o port SP 
07 fan speed 
08 powering 


16.9 Diagnosing switch problems 

In this section, we discuss typical problems related to the SP Switch that you 
should understand to prepare for your exam. If your system partition has an 
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SP Switch failure with the following symptoms, perform the appropriate 
recovery action described. 

16.9.1 Problems with Estart failure 

The Estart problems are caused by many different reasons. In this section, 
we discuss the following typical symptoms. 

16.9.1.1 Symptom 1: System cannot find Estart command 

Software installation and verification is done using the css_test script from 
either the SMIT panel or from the command line. 

Run css_test from the command line. You can optionally select the following 
options: 

-q To suppress messages. 

-l To designate an alternate log file. 

Note that if css_test is executed following a successful Estart, additional 
verification of the system will be done to determine if each node in the system 
or system partition can be pinged. If you are using system partitions, css_test 
runs in the active partition only. 

Then review the default log file, which is located at 
/var/adm/SPIogs/css/CSS_test.log to determine the results. 

Additional items to consider while trying to run css_test are as follows: 

• Each node should have access to the /usr/lpp/ssp directory. 

• /etc/inittab on each node should contain an entry for rc.switch. 

For complete information on css test, see page 56 in PSSP: Command and 
Technical Reference, SA22-7351. 

16.9.1.2 Symptom 2: Primary node is not reachable 

If the node you are attempting to verify is the primary node, start with Step 1. 
If it is a secondary node, start with Step 2. 

1. Determine which node is the primary by issuing the Eprimary command on 
the CWS: 

Eprimary 

returns 

1 - primary 
1 - oncoming primary 
15 - primary backup 
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15 - oncoming primary backup 

If the command returns an oncoming primary value of none, reexecute the 
Eprimary command specifying the node you would like to have as the 
primary node. Following the execution of the Eprimary command (to 
change the oncoming primary), an Estart is required to make the 
oncoming primary node the primary. 

If the command returns a primary value of none, an Estart is required to 
make the oncoming primary node the primary. 

The primary node on the SP Switch system can move to another node if a 
primary node takeover is initiated by the backup. To determine if this has 
happened, look at the values of the primary and the oncoming primary 
backup. If they are the same value, then a takeover has occurred. 

2. Ensure that the node is accessible from the control workstation. This can 
be accomplished by using dshto issue the date command on the node as 
follows: 

# /usr/lpp/ssp/rcmd/bin/dsh -w <problem hostname> date 
TOE Oct 22 10:24:28 EDT 1997 

If the current date and time are not returned, check the Kerberos or 
remote command problem. 

3. Verify that the switch adapter (cssO) is configured and is ready for 
operation on the node. This can be done by interrogating the 

adapter_config_status attribute in the switch_responds Object of the SDR: 

# SDRGetObjects switch_responds node_number==<problem node number> 

returns 


node_number switch_responds autojoin isolated adapter-config_status 
1000 css_ready 

If the adapter_config_status object is anything other than css_ready, see 
P223 of RS/6000 SP: PSSP 2.2 Survival Guide, SG24-4928. 

Note: To obtain the value to use for problem node number, issue an SDR 
query of the node_number attribute of the Node object as follows: 

# SDRGetObjects Node reliable_hostname==<problem hostname> 
node_number 

returns 

node_number 

1 
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4. Verify that the fault_service_Worm_RTG_SP daemon is running on the 
node. This can be accomplished by using dsh to issue a ps command to 
the problem node as follows: 

# dsh -w <problem_hostname> ps -e | grep Worm 
18422 -0:00 fault_service_Worm_RTG 

If the fault_service_Worm_RTG_SP daemon is running, SP Switch node 
verification is complete. 

If the fault_service_Worm_RTG_SP daemon is not running, try to restart it 

with: /usr/lpp/ssp/css/rc.switch 

16.9.1.3 Symptom 3: Estart command times out or fails 

Refer to the following list of steps to diagnose Estart failures: 

1. Log in to the primary node. 

2. View the bottom of the /var/adm/SPIogs/css/fs_daemon_print.file. 

3. Use the failure listed to index from the Table 19 on the PI 33 of the 
PSSP Diagnosis Guide, GA22-7350. 

If the message from the /var/adm/SPIogs/css/fs_daemon_print.file is not 
clear, we suggest to do the following before contacting IBM Software support: 

• Check SDR with SDR_test. 

• Run SDRGetobjects switch_responds to read the SDR switch_responds 
class and look for the values of the adapter_config_status attribute. 

• Run Etopoiogy -read <fiie_name>. Compare the output of the topology file 
with the actual cabling and make sure all the entries are correct. 

• Make sure the Worm daemon is up and running on all the nodes. Check 
the worm.trace file on the primary node for Worm initialization failure. 

• Make sure the Kerberos authentication is correct for all the nodes. 

• Run Eciock -d, and bring the Worm up on all nodes executing the 

/usr/lpp/ssp/css/rc. switch script. 

• Change the primary node to a different node using the Eprimary 
command. In changing the primary node, it is better to select a node 
attached to a different switch chip from the original primary or even a 
different switch board. 

• Check if all the nodes are fenced or not. Use the SDRChangeAttrvaiues 
command as follows to unfence the primary and oncoming primary. Note 
that the command SDRChangeAttrvaiues is dangerous if you are not using it 
properly. It is recommended to archive SDR before using this command. 
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# SDRChangeAttrValu.es switch_responds node_number==<primary ncde_num> 
isolated=0 

• Now try Estart. If it fails, contact IBM Software support. 

16.9.1.4 Symptom 4: Some nodes or links not initialized 

When evaluating device and link problems on the system, first examine the 
out.top file in the /var/adm/SPIogs/css directory of the primary node. This file 
looks like a switch topology file except for the additional comments on lines 
where either the device or link is not operational. 

These additional comments are appended to the file by the fault_service 
daemon to reflect the current device and link status of the system. If there are 
no comments on any of the lines, or the only comments are for wrap plugs 
where they actually exist, you should consider all devices and links to be 
operational. If this is not the case, however, the following information should 
help to resolve the problem. 

The following is an example of a failing entry in the out.top file: 

s 14 2 tb3 9 0 E01-S17-BH-J32 to E01-N10 -4 R: device has been removed 
from network-faulty (link has been removed from network or 
miswired-faulty) 

This example means the following: 

• Switch chip 14, port 2 is connected to switch node number 9. 

• The switch is located in frame E01 slot 17. 

• Its bulkhead connection to the node is jack 32. 

• The node is also in frame E01, and its node number is 10. 

• The -4R refers to the device status of the right side device (tbO 9), which 
has the more severe device status of the two devices listed. The device 
Status of the node is device has been removed from the network - faulty. 

• The link status is link has been removed from the network or miswired 
-faulty. 

For detail list of possible device status for SP switch, refer to P119-120 of the 
PSSP Diagnosis Guide, GA22-7350. 

16.9.2 Problem with pinging to SP Switch adapter 

If the SP node fails to communicate over the switch, but its switch_responds 
is on and ping or css_test commands fail. Check the following: 

To isolate an adapter or switch error for the SP Switch, first view the AIX error 
log. For switch related errors, log in to the primary node; for adapter 
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problems, log in to the suspect node. Once you are logged in, enter the 
following: 

# errpt | more 

ERROR_ID TIMESTAMP T CL Res Name ERROR_Description 

34FFBE83 0604140393T T H Worm Switch Fault-detected by switch chip 

C3189234 0604135793 T H Worm Switch Fault-not isolated 

The Resource Name (Res Name) in the error log should give you an 
indication of how the failure was detected. For details, refer to Table 17 and 
Table 18 on pp.121-132 of the PSSP Diagnosis Guide, GA22-7350. 

16.9.3 Problems with Eunfence 

The Eunfence command first distributes the topology file to the nodes before 
they can be unfenced. But, if the command fails to distribute the topology file, 
it puts an entry in the dist_topology.log file on the primary node in the 
/var/adm/SPIogs/css directory. 

The Eufence command fails to distribute the topology file if the Kerberos 
authentication is not correct. 

The Eunfence command will time out if the Worm daemon is not running on the 
node. So, before running the Eunfence command, make sure the Worm 
daemon is up and running on the node. To start the Worm daemon on the 
node, it is required that you run the /usr/ipp/spp/css/rc.switch script. 

If the problem persists after having correct Kerberos authentication, and the 
Worm daemon is running, the next step is to reboot the node. Then, try the 
Eunfence command again. 

If neither of the previous steps resolve the problem, you can run diagnostics 
to isolate a hardware problem on the node. 

The last resort, if all fails, would be to issue an Eciock command. This is 
completely disruptive to the entire switch environment; so, it should only be 
issued if no one is using the switch. An Estart must be run after Eciock 
completes. 

16.9.4 Problems with fencing primary nodes 

If the oncoming primary node becomes fenced from the switch use the 
following procedure to Eunfence it prior to issuing Estart: 

• If the switch is up and operational with another primary node in control of 
the switch, then issue Eunfence on the oncoming primary, and issue Estart 
to make it the active primary node. 
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[sp3en0:/]# Eunfence 1 
All node(s) successfully unfenced. 

[sp3en0:/]# Estart 

Switch initialization started on sp3n01 
Initialized 14 node(s). 

Switch initialization completed. 

• If the switch is operational, and Estart is failing because the oncoming 
primary's switch port is fenced, you must first change the oncoming 
primary to another node on the switch and Estart. Once the switch is 
operational, you can then Eunfence the old oncoming primary node. If you 
also want to make it the active primary, then issue an Eprimary command 
to make it the oncoming primary node and Estart the switch once again. 

[sp3en0 : / ] # Eprimary 5 

Eprimary: Defaulting oncoming primary backup node to 
sp3nl5.msc.itso.ibm.com 

[sp3en0:/]# Estart 

Estart: Oncoming primary ! = primary, Estart directed to oncoming primary 
Estart: 0028-061 Estart is being issued to the primary node: 
sp3n05.msc.itso.ibm.com. 

Switch initialization started on sp3n05.msc.itso.ibm.com. 

Initialized 12 node(s). 

Switch initialization completed. 

[sp3en0:/]Eunfence 1 
All node(s) successfully unfenced. 

[sp3en0 : / ] # Eprimary 1 

Eprimary: Defaulting oncoming primary backup node to 
sp3nl5.msc.itso.ibm.com 

[sp3en0:/]# Estart 

Estart: Oncoming primary != primary, Estart directed to oncoming primary 
Estart: 0028-061 Estart is being issued to the primary node: 
sp3n01.msc.itso.ibm.com. 

Switch initialization started on sp3n01.msc.itso.ibm.com. 

Initialized 13 node(s). 

Switch initialization completed. 

• If the oncoming primary's switch port is fenced, and the switch has not 
been started, you can not check that the node is fenced or not with the 
Efence command. The only way you can see which nodes are fenced is 
through the SDR. To check whether the oncoming primary fenced or not, 
issue: 
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# SDRGetObjects switch_responds 

If you see the oncoming primary node is isolated, the only way you can 
change the SDR is through SDRChangeAttrvaiues command. Before using 
this command, do not forget to archive SDR. 

# SDRChangeAttrvaiues switch_responds node_number==<oncoming primary 
node_number> isolated=0 

# SDRGetObjects switch_responds node_number==<oncoming primary 
node_number> 

Then, issue the command: Estart 


16.10 Impact of host name/IP changes on SP system 

In the distributed standalone RS/6000 environment, you simply update 
/etc/hosts file or DNS map file and reconfigure the adapters when you need to 
change the host name or IP address. However, in an SP environment, the 
task involved is not simple, and it affects the entire SP system. The IP 
address and host names are located in the System Data Repository (SDR) 
using objects and attributes. The IP address and host names are also kept in 
system-related files that are located on SP nodes and the CWS. 

This section describes the SDR classes and system files when you change 
either the primary Ethernet IP address and host name for the SP nodes or the 
CWS. We suggests that you avoid making any host name or IP address 
changes if possible. The tasks are tedious and in some cases require 
rerunning the SP installation steps. For detail procedures, refer the Appendix 
H in the PSSP Administration Guide, SA22-7348. These IP address and host 
name procedures support SP nodes at PSSP levels PSSP 3.1 (AIX 4.3), 
PSSP 2.4 (AIX 4.2 and 4.3), PSSP 2.2 (AIX 4.1-4.2), and PSSP 2.3 (AIX 4.2 
or 4.3) systems. The PSSP 3.1 release supports both SP node coexistence 
and system partitioning. 

Consider the following PSSP components when changing the IP address and 
hostnames: 

• Network Installation Manager (NIM) 

• System partitioning 

• IBM Virtual Shared Disk 

• High Availability Control Workstation (HACWS) 

• RS/6000 Cluster Technology (RSCT) Services 

• Problem management subsystem 
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• Performance monitor services 

• Extension nodes 

• Distributive Computing Environment (DCE) 

16.10.1 SDR objects with host names and IP addresses 

The following SDR objects reference the host name and IP address in the SP 
system for PSSP systems: 

• Adapter Specifies the IP addresses used with the switch cssO 

adapter, or the Ethernet, FDDI, or token ring adapters. 

• Frame Specifies the Monitor and Control Nodes MACN and 

HACWS. 

• backup_MACN Attributes on the control workstation that work with 

host names. 

• JM_domain_info Works with the host names for Resource Manager 

domains. 

• JM_Server_Nodes Works with the host names for Resource Manager 

• Node 

• Pool 

• SP 


• SP_ports 

• Switch_partition 

• Syspar 

• Syspar_map 

• pmandConfig 


server nodes. 

Works with the initial or reliable host names and uses 
the IP address for SP nodes and boot servers. The 
nodes are organized by system partitions. 

Works with host names for Resource Manager pools 

Works with control workstation IP addresses and host 
names. Uses the host name when working with 
Network Time Protocol (NTP) printing, user 
management, and accounting services. 

Works with the host name used with hardmon and the 
control workstation. 

Works with the host name for primary and backup 
nodes used to support the css SP switch. 

Works with the IP address and SP_NAME with system 
partitions. 

Provides the host name and IP address on the CWS 
for system partitions. 

Captures the SP node host name data working on 
problem management. 
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• SPDM 


Works with the host name for Performance Monitor 
status data. 


• SPDM_NODES Works with the host name for SP nodes and organized 

by system partition. 

• DependentNode Works with the host name for the dependent extension 

node. 

• DependentAdapter Works with the IP address for the dependent 

extension node adapter. 

16.10.2 System files with IP addresses and host names 

The following files contain the IP address or host name that exists on the SP 
nodes and the control workstation. We recommend that you look through 
these files when completing the procedures for changing host names and IP 
addresses for your SP system. The following files are available for PSSP 
systems: 

• /.rhosts - Contains host names used exclusively with rcmd services. 

• /.klogin - Contains host names used with authentication rcmd services. 

• /etc/hosts - Contains IP addresses and host names used with the SP 
system. 

• /etc/resolv.conf - Contains the IP address for Domain Name Service 
(DNS) (Optional). 

• /var/yp/ NIS - References the host name and IP address with the Network 
Information Service (NIS). 

• /etc/krb5.conf - Works with the host name for DCE. 

• /etc/krb.conf - Works with the host name for the authentication server. 

• /etc/krb.realms - Works with the host name of the SP nodes and 
authentication realm. 

• /etc/krb-srvtab - Provides the authentication service key using host name. 

• /etc/SDR_dest_info - Specifies the IP address of the control workstation 
and the SDR. 

• /etc/ssp/cw_name - Specifies the IP address of control workstation host 
name on SP nodes that work with node installation and customization. 

• /etc/ssp/server_name - Specifies the IP address and host name of the SP 
boot/install servers on SP nodes working with node customization. 

• /etc/ssp/server_hostname - Specifies the IP address and host name of the 
SP install servers on SP nodes working with node installation. 
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• /etc/ssp/reliable_hostname - Specifies the IP address and host name of 
the SP node working with node installation and customization. 

• /etc/ntp.conf - Works with the IP address of the NTP server (Optional). 

• /etc/filesystems - Can contain the IP address or host name of NFS 
systems (mainly used on /usr client systems). 

• /tftpboot/ host.configjnfo - Contains the IP address and host name for 
each SP node. It is found on the CWS and boot servers. 

• /tftpboot/ host.intstalljnfo - Contains the IP address and host name for 
each SP node. It is found on the CWS and boot servers. 

• /tftpboot/ host-new-srvtab - Provides authentication service keys using 
host name. It is found on the CWS and boot servers. 

• /etc/rc.net - Contains the alias IP addresses used with system partitions. 

• /etc/niminfo - Works with the NIM configuration for NIM master 
information. 

• /etc/sysctl.acl - Uses host name that works with Sysctl ACL support. 

• /etc/logmgt.acl - Uses host name that works with Error Log Mgt ACL 
support. 

• /spdata/sysl/spmon/hmacls - Uses short host name that works with 
hardmon authentication services. 

• /etc/jmd_config. SP_NAME - Works with host names for Resource 
Management on the CWS for all defined SP_NAME syspars. 

• /usr/lpp/csd/vsdfiles/VSD_ipaddr - Contains the SP node IBM Virtual 
Shared Disk adapter IP address. 

• /spdata/sys1/ha/cfg/em.<SP_NAMEcdb>.<Data> - Uses Syspar host 
name that works with configuration files for Event Management services. 

• /var/ha/run/ Availability Services - Uses Syspar host name that contains 
the run files for the Availability Services. 

• /var/ha/log/ Availability Services - Uses Syspar host name that contains 
the log files for the Availability Services. 

• /var/adm/SPIogs/pman/ data - Uses Syspar host name that contains the 
log files for the Problem Management subsystem. 

• /etc/services - Specifies short host name based on SP_NAME partition 
that work with Availability Services port numbers. 

• /etc/auto/maps/auto.u - Contains host names of the file servers providing 
NFS mounts to Automount. 
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• /etc/amd/amd-maps/amd.u - Contains host names of the file servers 
providing NFS mounts to AMD. 


16.11 Related documentation 

The following documents are recommended for understanding the topics in 
this chapter and detail its recovery procedures. 

SP Manuals 

This chapter introduces a summary of general problem diagnosis to prepare 
for the exam. Therefore, you should read Part 2 of the PSSP Diagnosis 
Guide, GA22-7350, for a full description. In addition, you may read Chapters 
4, 5, 8, 12, and 14 of the PSSP: Administration Guide, SA22-7348, to get the 
basic concepts of each topic we discuss here. 

SP Redbooks 

There is no problem determination redbook available for PSSP 2.4. You can 
use RS/6000 SP: PSSP 2.2 Survival Guide, SG24-4928, for PSSP 2.2 This 
redbook discusses extreme details on node installation and SP switch 
problems. 


16.12 Sample questions 

This section provides a series of questions to help aid you in preparation for 
the certification exam. The answers to these questions can be found in 
Appendix A. 

1. During PSSP 2.4 installation, the setup_server script returns the following 
error: 

mknimres: 0016-395 Could not get size of 

/spdata/sysl/install/pssplpp/7[1]/pssp.installp on control workstation 

You could correct the error by issuing: 

A. mv ssp.usr.2.4.0.0 /spdata/sysl/install/pssplpp/ssp.installp 

B. mv ssp.usr.2.4.0.0 /spdata/sysl/install/pssplpp/pssp.installp 

C. mv ssp.usr. 2. 4.0.0 /spdata/sysl/install/pssplpp/pssp-2 . 4/ssp. 
installp 

D. mv ssp.usr.2.4.0.0 

/spdata/sysl/install/pssplpp/PSSP-2.4/pssp. installp 

2. Select one problem determination/problem source identification 
methodology statement to resolve this situation: 


Chapter 16. Problem diagnosis 473 



You discover you are unable to log in to one of the nodes with any ID 
(even root) over any network interface OR the node's TTY console. You 
begin recovery by booting the node into maintenance, getting a root shell 
prompt, and... 

A. 1) Run the df command, which shows 100 percent of the node's critical 
filesystems are used. Clear up this condition. 

2) Realize that Supper may have updated the /etc/passwd file to a 0 
length file. Correct /etc/passwd. 

3) Reboot to Normal mode. 

4) Run supper update on the node. 

5) Now all IDs can log in to the node. 

B. 1) Check permissions of the /etc/passwd file to see if they are correct. 

2) Check that /etc/hosts file-all host lines show three duplicate entries. 
Edit out these duplicate entries. 

3) Reboot to Normal mode. 

4) Now all IDs can log in to the node. 

C. 1) Check name resolution and TCPIP (ping,telnet) functions to/from the 
nodes. No problems. 

2) On CWS: Check if hardmon is running. It is not; so, restart it. 

3) Correcting hardmon allows login of all IDs to the node. 

D. 1) Check if Kerberos commands work. They do. 

2) TCPIP (telnet, ping). Does not work. 

3) Fix TCPIP access by: 

# /usr/lpp/ssp/rcmd/bin/rsh /usr/lpp/ssp/rcmd/ \ 
bin/rcp spcwl:/etc/passwd /etc/passwd 

# /usr/lpp/ssp/rcmd/bin/rsh /usr/lpp/ssp/rcmd/ \ 
bin/rcp spcwl:/etc/hosts /etc/hosts 

4) Now all users can log in to the node. 

3. Apart from a client node being unable to obtain new tickets, the loss of the 
CWS will not stop normal operation of the SP complex: 

• True 

• False 

4. If a supper update returned the message could not connect to server, the 
cause would most likely be: 
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A. The supfilesrv daemon is not running and should be restarted. 

B. The SDR_dest_info file is missing and should be recreated. 

C. The root file system on the node is full. 

D. There is a duplicate IP address on the SP Ethernet. 

5. If a user running a Kerberized rsh command receives a message including 
the text couldn't decode authenticator, would the most probable solution 
be: (More than one answer is correct.) 

A. Remove the .rhosts file. 

B. Check that the time is correct and reset it if not. 

C. Generate a fresh krb-srvtab file for the problem server. 

6. After having renamed the ssp.usr fileset to the appropriate name, you 
receive an error message from setup_server that says the fileset 
indicated could not be found. You Should check that: 

A. The ssp.usr fileset is present. 

B. The table of contents for the /spdata/sysl/install/images directory. 

C. The .toe file for the pssplpp subdirectory mentioned is up to date. 

D. The correct file permissions on the /usr spot are set to 744. 
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Appendix A. Answers to sample questions 


This appendix contains the answers and a brief explanation to the sample 
questions included in every chapter. 


A.1 Hardware validation and software configuration 

Answers to questions in 2.17, “Sample questions” on page 71, are as follows: 

Question 1 - The answer is B. Although primary backup nodes are 
recommended for high availability, it is not a requirement for switch 
functionality or for the SP Switch router node. In the event of a failure in the 
primary node, the backup node can take over the primary duties so that new 
switch faults can continue being processed. For more information on this, 
refer to 2.5, “Dependent nodes” on page 25. 

Question 2 - The answer is B. The two switch technologies (SP Switch and 
HiPS) are not compatible. PSSP 2.4 is the last PSSP level that support the 
HiPS switch. PSSP 3.1 or later does not support the older switch. 

Question 3 - The answer is A. PSSP 3.1 requires AIX 4.3.2 or later. The 
Performance Toolbox manager extension (perfagent.server fileset) is no 
longer a prerequisite in PSSP 3.1. Refer to 2.12, “Software requirements” on 
page 51 for details. 

Question 4 - The answer is A. The new PCI thin nodes (both PowerPC and 
POWER3 versions) have two PCI slots available for additional adapters. The 
Ethernet and SCSI adapters are integrated. The switch adapter uses a 
special MX (mezzanine bus) adapter (MX2 for the POWER3 based nodes). 
For more information, refer to 2.4.1, “Internal nodes” on page 14. 

Question 5 - The answer is C. The SP-Attached server requires a minimum 
of four connections with the SP system in order to establish a functional and 
safe network. If your SP system is configured with an SP Switch, there will be 
five required connections. 

Question 6 - The answer is B. The CWS acts as a boot/install server for other 
servers in the RS/6000 SP system. In addition, the control workstation can be 
set up as an authentication server using Kerberos. As an alternative, the 
control workstation can be set up as a Kerberos secondary server with a 
backup database to perform ticket-granting service. 
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Question 7 - The answer is B. For more information, refer to section 2.6.2, 
“Control Workstation Minimum Hardware Requirements” on page 31 for 
details. 

Question 8 - The answer is D. The hardware components that make up the 
SP Switch network are: The Switch Link, the Switch Port, The Switch Chip, 
the Switch Adapter, and the Switch Board. 

Question 9 - The answer is C. By default the, the control workstation is the 
boot/install server. The CWS is responsible for AIX and PSSP software 
installations to the nodes. You can also define other nodes to be a boot/install 
server. The minimum requirement to off-load the CWS as the only boot/install 
server is to define one boot/install server in either frame 1 or frame 2. For 
more information on boot/install servers, refer to section 2.7, “Boot/lnstall 
Server Requirements” on page 34 for details. 

Question 10 - The answer is B. A short frame support only a single SP 
Switch-8 board. For more information, refer to section 2.14.1.2, “SP Switch-8 
Short Frame Configurations” on page 57. 


A.2 RS/6000 SP networking 

Answers to questions in 3.6, “Sample questions” on page 103, are as follows: 

Question 1 - The answer is D. Hardware control is done through the serial 
connection (RS-232) between the control workstation and each frame. 

Question 2 - The answer is B. The reliable hostname is the name associated 
to the enO interface on every node. The initial hostname is the hostname of 
the node. The reliable hostname is used by the PSSP components in order to 
access the node. The initial hostname can be set to a different interface (for 
example, the cssO interface) if applications need it. 

Question 3 - The answer is B. If the /etc/resolv.conf file exist, AIX will follow 
a predefined order with DNS in the first place. The default order can be 
altered by creating the /etc/netsvc.conf file. 

Question 4 - The answer is D. In a single segment network, the control 
workstation is the default route and default boot/install server for all the 
nodes. When multiple segments are used, the default route for nodes will not 
necessarily be the control workstation. The boot/install server (BIS) is 
selected based on network topology; however, for a node to install properly, it 
needs access to the control workstation even when it is being installed from a 
BIS other than the control workstation. In summary, every node needs a 
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default route, a route to the control workstation, and a boot/install server in its 
own segment. 

Question 5 - The answer is C. A netmask of 255.255.255.224 provides 30 
discrete addresses per subnet. 

Question 6 - The answer is C. Ethernet, Fiber Distributed Data Interface 
(FDDI), and token-ring are configured by the SP. Other network adapters 
must be configured manually. 

Question 7 - The answer is C. The default order in resolving host names is: 
BIND/DNS, NIS, and local /etc/hosts file. The default order can be 
overwritten by creating a configuration file, called /etc/netsvc. conf, and 
specifying the desire order. 

Question 8 - The answer is D. There are four basic daemons that NIS uses: 
ypserv, ypbind, yppasswd, and ypupdated. NIS was initially called yellow 
pages; hence, the prefix yp is used for the daemons. 

Question 9 - The answer is C. A NIS server is a machine that provides the 
system files to be read by other machines on the network. There are two 
types of servers: Master and Slave. Both keep a copy of the files to be shared 
over the network. A master server is the machine where a file may be 
updated. A slave server only maintains a copy of the files to be served. A 
slave server has three purposes: To balance the load if the master server is 
busy, to back up the master server, and to enable NIS requests if there are 
different networks in the NIS domain. 

Question 10 - The answer is D. NIS serves files in the form of maps. There is 
a map for each of the files that it serves. Information from the file is stored in 
the map, and it is the map that is used to respond to client requests. Refer to 
3.2.5, “NIS” on page 79 for details. 


A.3 I/O devices and file systems 

Answers to questions in 4.6, “Sample questions” on page 141, are as follows: 

Question 1 - The answer is C. Nodes are independent machines. Any 
peripheral device attached to a node and can be shared with other nodes in 
the same way as stand-alone machines can share resources on a network. 
The SP Switch provides a very high bandwidth that makes it an excellent 
communication network for massive parallel processing. 
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Question 2 - The answer is C. Only MicroChannel nodes support external 
SSA booting. The reason is that no PCI SSA adapters have been tested to 
certified external booting support. This is true by the time of this writing, but it 
may change by the time you read this. Refer to 4.3.4, “Booting from external 
disks” on page 126 for details. 

Question 3 - The answer is A. PSSP 3.1 supports multiple rootvg definitions 
per node. Before you can use an alternate rootvg volume group, you need to 
install the alternate rootvg in an alternate set of disks. To activate it, you have 
to modified the boot list on that node. PSSP provides a command to modify 
the boot list remotely; it is spbootiist. Refer to 4.3.2.8, “spbootlist” on page 
124 for details. 

Question 4 - The answer is B. The boot/install server is a NFS server for 
home directories. You can set a node to be a NFS server for home 
directories, but this does not depend on that node being a boot/install server. 
The control workstation is always a NFS server even in cases where all 
nodes are being installed from boot/install servers other than the control 
workstation. The control workstation always NFS exports the Ippsource 
resources to all nodes. 

Question 5 - The answer is A. spmirrorvg enables mirroring on a set of nodes 
given by the option -l node_iist. You can force the extension of the Volume 
Group by using the -f option (available values are: yes or no). Refer to 
4.3.2.5, “spmirrorvg” on page 121. 

Question 6 - The answer is A. spistdata can now displays information about 
Volume_Groups. Refer to 4.3.2.7, “Changes to spistdata in PSSP 3.1 or 
Later” on page 123. 

Question 7 - The answer is B. It is not recommended to use NFS in large 
production environments that require fast, secure, and easy to manage global 
file systems. On the other hand, NFS administration is fairly easy, and small 
environments with low security requirements will probably choose NFS as 
their global file system. 

Question 8 - The answer is C. DFS is a distributed application that manages 
file system data. It is an application of DCE that uses almost all of the DCE 
services to provide a secure, highly available, scalable, and manageable 
distributed file system. DFS data is organized in three levels: Files and 
directories, filesets, and aggregates. Refer to 4.4.2.1, “What is the Distributed 
File System?” on page 136. 
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Question 9 - The answer is C. The following are the other default values: The 
default install_disk is hdiskO, quorum is true, mirroring is off, copies are set to 
1, there are no bootable alternate root Volume Groups, and all other 
attributes of the Volume_Groups are initialized according to the same rules as 
the Node object. Refer to 4.3.1.2, “Volume_Group Default Values” on page 
115. 

Question 10 - The answer is B. Refer to 4.3.2.1, “spmkvgobj” on page 116. 


A.4 SP-attached server support 

Answers to questions in 5.8, “Sample questions” on page 180, are as follows: 

Question 1 - The answer is D. Each SP-attached server must be connected 
to the control workstation through two RS-232 serial links and an Ethernet 
connection. One of the RS-232 lines connects the control workstation with the 
front panel of the SP-attached server and uses a System and Manufacturing 
Interface protocol (SAMI). The other line goes to the back of the CEC unit and 
attaches to the first integrated RS-232 port in the SP-attached server. This 
line serves as the slterm emulator. Remember that login must be enabled in 
that first integrated port (SI) in order to slterm to work. Refer to 5.2.2, 
“SP-Attached server attachment” on page 147 for details. 

Question 2 - The answer is D. SP-attached servers cannot be installed 
between switched frames and expansion frames. Although SP-attached 
servers can be placed anywhere in the SP complex because they do not 
follow the rules of standard SP frames, this restriction comes from the 
expansion frame itself. All expansion frames for frame n must be numbered 
n+1, n+2, and n+3. Refer to 2.14, “Configuration rules” on page 55 for details. 

Question 3 - The answer is B. SP-attached servers do not have frame or 
node supervisor cards, which limits the capabilities of the hardmon daemon 
to monitor or control these external nodes. Most of the basic hardware control 
is provided by the SAMI interface, however most of the monitoring 
capabilities are provided by an internal sensor connected to the node 
supervisor cards. So, the lack of node supervisor cards on SP-attached 
servers limits those monitoring capabilities. 

Question 4 - The answer is B. The s70d daemon is started and controlled by 
the hardmon daemon. Each time the hardmon daemon detects a SAMI frame 
(a SP-attached server seen as a frame), it starts a new s70d process. The 
hardmon daemon will keep a socket connection with this s70d. The s70d will 
translate the commands coming from the hardmon daemon into SAMI 
commands. Refer to 5.4.2, “Hardmon” on page 166 for details. 
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Question 5 - The answer is D. The SP-Attached server cannot be the first 
frame in the SP system. So, the first frame in the SP system must be an SP 
frame containing at least one node. This is necessary for the SDR_config 
code, which needs to determine whether the frame is with or without a switch. 

Question 6 - The answer is B. The new SP-Attached server does not have a 
frame or node supervisor card that can communicate with the hardmon 
daemon. Therefore, a new mechanism to control and monitor SP-Attached 
servers is provided in PSSP 3.1. Hardmon provides support for the 
SP-Attached server in the following way: It discovers the existence of 
SP-Attached servers, and it controls and monitors the state of SP-Attached 
servers, such as power on/off. 

Question 7 - The answer is C. Your SP system must be operating with a 
minimum of PSSP 3.1 and AIX 4.3.2 before you can place the SP-Attached 
server into service. 

Question 8 - The answer is A. The SP-Attached server occupies the slot one 
position. 

Question 9 - The answer is B. For the S70 server, only the 10Mbps BNC or 
the 10Mbps AUI Ethernet adapters are supported for SP-LAN communication 
BNC adapters provides the BNC cables, but the AUI ethernet adapter does 
not provide the twisted pair cables. 

Question 10 - The answer is D. Refer to 5.5.1.1, “System Management” on 
page 173 for details. 


A.5 SP security 

Answers to questions in 6.16, “Sample questions” on page 215, are as 
follows: 

Question 1 - The answer is C. The rc.sp script runs every time a node boots. 
This script checks the Syspar class in the SDR and resets the authentication 
mechanism based on the attributes in that class. Using the chauthent 
command directly on a node will cause the node to be in an inconsistent state 
with the rest of the system, and the change will be lost by the time of the next 
boot. It is recommended not to change the authentication setting directly on 
the node but through the use of PSSP command and SDR settings. 

Question 2 - The answer is D. One of the reasons why PSSP 3.1 still 
requires Kerberos v4, although it supports, through AIX, Kerberos v5, is the 
fact that the hardmon daemon and the sysctl facility still require Kerberos v4 
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for authentication. Refer to 6.12, “SP services that utilize Kerberos” on page 
202 for details. 

Question 3 - The answer is D. The /etc/krb-srvtab files contain the private 
password for the Kerberos services on a node. This is a binary file, and its 
content can be viewed by using the kiist -srvtab command. By default the 
hardmon and the remote command (rcmd) principals maintain their private 
passwords in this file. Refer to 6.10, “Server key” on page 200 for details. 

Question 4 - The answer is A. Although the SP Perspectives uses services 
that are Kerberos clients, the interface itself is not a Kerberos client. Event 
Perspectives requires you to have a valid Kerberos principal to generate 
automatic actions upon receiving event notifications (this facility is provided 
by the problem management subsystem). The Hardware Perspective requires 
you to have a valid Kerberos principal in order to access the hardware control 
monitoring facilities that are provided by the hardmon daemon. The VSD 
Perspective requires you to have a valid Kerberos principal to access the 
VSD functionality because the VSD subsystems uses sysctl for control and 
monitoring of the virtual shared disk, nodes, and servers. 

Question 5 - The answers are A and D. Two service names are used by the 
Kerberos-authenticated applications in an SP system: hardmon used by the 
system Monitor daemon on the control workstation by logging daemons, and 
rcmd used by sysctl. 

Question 6 - The answer is B. One of the procedures to add a Kerberos 
Principal is to use the mkkp command. This command is non-interactive and 
does not provide the capability to set the principal’s initial password. The 
password must, therefore, be set by using the kadmin command and its 
subcommand, cwp. Refer to 6.9.1, “Add a Kerberos Principal” on page 194 for 
more details. 

Question 7 - The answer is D. On the SP, there are three different sets of 
services that use Kerberos authentication: The hardware control subsystem, 
the remote execution commands, and the sysctl facility. 

Question 8 - The answer is C. PSSP support the use of an existing AFS 
server to provide Kerberos Version 4 services to the SP. Usage of AFS on SP 
systems is optional. 

Question 9 - The answer is B. The three kerberos daemons are: Kerberos, 
kadmind, and kpropd. 

Question 10 - The answer is D. The kstash command kills and restarts the 
kadmin daemon, and recreates the /.k file to store the new master key in it. 
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A.6 User and data management 

Answers to questions in 7.8, “Sample questions” on page 244, are as follows: 

Question 1 - The answer is C. If you are using the SP User Management 
facilities, File Collection will automatically replace the /etc/passwd and the 
/etc/security/passwd files every other hour. This makes it possible to have 
global SP users by having a common set of user files across nodes. The 
passwd command gets replaced by a PSSP command that will prompt the user 
to change its password on the control workstation, which is the password 
server by default. 

Question 2 - The answer is C. SP users are global AIX users managed by 
the SP User Management facility (SPUM). All the user definitions are 
common across nodes. The SPUM provides mechanisms to NFS mount a 
home directory on any node and to provide the same environment to users no 
matter where they log in to. Refer to 7.3, “SP User data management” on 
page 220 for details. 

Question 3 - The answer is D. The spac cntri command is used to set 
access control to node. This command must be executed on every node 
where you want to restrict user access, for example, to run batch jobs without 
users sniffing around. Refer to 7.3.6, “Access control” on page 224 for details. 

Question 4 - The answer is B. All the user related configuration files are 
managed by the user.admin file collection. This collection is defined by 
default and it activated when you selected the SPUM as your user 
management facility. Refer to 7.5.3.2, “user.admin collection” on page 230 for 
details. 

Question 5 - The answer is D. PSSP is shipped with four predefined file 
collections: sup.admin, user.admin, power_system, and node.root. 

Question 6 - The answer is B. The supper command is used to report 
information about file collections. It has a set of subcommands to perform 
files and directories management that includes verification of information and 
the checking of results when a procedure is being performed 

Question 7 - The answer is C. NIS allows a system administrator to maintain 
system configuration files in one place. These files only need to be changed 
once then propagated to the other nodes. Refer to 7.4, “Configuring NIS” on 
page 222 for details. 
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Question 8 - The answer is D. The default hierarchy of updates for file 
collections is in the following sequence: CWS/BIS/Nodes. However, the 
default hierarchy can be changed. Refer to 7.5.9, “Modifying the File 
Collection Hierarchy” on page 236 for details. 

Question 9 - The answer is C. Make sure you are working with the master 
files. Refer to 7.5.8.4, “Adding and Deleting files in a File Collection” on page 
235 for details. 

Question 10 - The answer is B. AIX Automounter is a tool that can make the 
RS/6000 SP system appear as only one machine to both the end users and 
the applications by means of a global repository of storage. It manages 
mounting activities using standard NFS facilities. It mounts remote systems 
when they are used and automatically dismounts them when they are no 
longer needed. 


A.7 Configuring the control workstation 

Answers to questions in 8.8, “Sample questions” on page 264, are as follows: 

Question 1 - The answer is D. The partition-sensitive daemons are controlled 
by the syspar_ctri command. The instaii_cw script will not create or start 
those daemons. Refer to 8.3.2, “install_cw” on page 253 for details. 

Question 2 - The answer is B. In the release prior to PSSP 3.1, the System 
Performance Measurement Interface (SPMI) library was required by some 
PSSP components. This library was packaged as part of the Performance 
Toolbox Aide (PAIDE) package companion of the Performance Toolbox for 
AIX. In AIX 4.3.2, which is a prerequisite for PSSP 3.1, the SPMI library is 
shipped in the perfagent.tools fileset and not in the perfagent.server 
component as in previous releases. Although most of the PSSP components 
will not use the SPMI library, the aixos resource monitor will need it in order 
to provide resource variables to Event Management. In summary, the 
perfagent.tools components is a pre-requisite for PSSP 3.2 running on AIX 
4.3.2. 

Question 3 - The answer is A. The RS/6000 Cluster Technology (RSCT) is a 
prerequisite for the PSSP. It was packaged as ssp.ha in releases prior to 
PSSP 3.1. These filesets provide the program and the configuration files for 
the three key components within the RS/6000 SP (Topology Services, Group 
Services, and Event Management). 

Question 4 - The answer is D. The /etc/rc. net file is the recommended 
location for setting any static routing information. In the case where the CWS 
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and all nodes enO adapters are not on the same Ethernet segment, the 
/etc/rc. net file of the CWS can be modified to include a routing statement 

Question 5 - The answer is A. The monitoring and control of the SP frames 
and nodes hardware from the CWS requires a serial connection between the 
CWS and each frame in the SP system. If there are many frames, there may 
not be enough build-in serial adapters on the CWS and additional serial 
adapter cards may need to be installed in the CWS. 

Question 6 - The answer is C. The setup-authent has no arguments. It 
configures the Kerberos authentication services for the SP system. The 
command first searches the AIX system for Kerberos services already 
installed, checks for the existence of Kerberos configurations files, and then 
enters an interactive dialog where you are asked to choose and customize 
the authentication method to use for the management of the SP system. 

Question 7 - The answer is B. In the case that SP-Attached servers are 
included in the SP system, two serial cables are needed to link the CWS to 
each of the servers. An Ethernet connection is also mandatory between the 
CWS and the server configured on the enO adapter of the server. 

Question 8 - The answer is C. The required fileset for PSSP 3.1 with an SP 
Switch are ssp.css and ssp.top. 

Question 9 - The answer is A. The installable images (Ipp) of the AIX 
systems must be store in directories named 

/spdata/sysl/install/<source_name>/lppsource. You can set 

<source_name> to the name your prefer. However, it is recommended to use 
a name identifying the version of the AIX Ipps stored in this directory. The 
names generally used are aix421, aix431 and so on. 

Question 10 - The answers are C and D. perfagent.tools and 
perfagent.server 2.2.32.x are required filesets for AIX 4.3.2 and PSSP 2.4. 


A.8 Frames and nodes installation 

Answers to questions in 9.5, “Sample questions” on page 294, are as follows: 

Question 1 - The answers are A and C. The initial hostname is the real host 
name of a node, while the reliable hostname is the hostname associated to 
the enO interface on that node. Most of the PSSP components will use the 
reliable hostname for accessing PSSP resources on that node. The initial 
hostname can be set to a faster network interface (such as the SP Switch) if 
applications use the node’s hostname for accessing resources. 
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Question 2 - The answer is C. The spsvrmgr command is used for checking 
frame and node supervisor microcode levels. The -Gflag will contact all the 
frame supervisor cards in the system. Refer to 9.2.3, “Check the level of 
supervisor microcode” on page 270 for details. 

Question 3 - The answer is D. A boot/install server is defined when nodes 
have their install server field pointing to a particular node. By default, the 
control workstation is the boot/install server to all nodes, but in a multi-frame 
environment, PSSP will choose the first node in each frame to be the 
boot/install server for the nodes in that frame. The spbootins command will 
run the setup_server script remotely in any boot/install server node. 

Question 4 - The answer is A. The nodecond script runs on the control 
workstation, and it accesses each node’s console through a read/write 
slterm. The slterm uses the RS-232 line to the frame for opening the 
console of a node in that frame. The Ethernet network is not used until the 
node starts network booting after the nodecond script has selected all the 
necessary options in the network boot menu. 

Question 5 - The answer is D. The spadaptrs command is used to configure 
additional adapters into the SRD. It executes on the CWS only using the 
command line interface or the equivalent functions accessible from the SMIT 
Additional Adapter Database Information window (smitty add_adapt_dialog). 

Question 6 - The answer is C. The syspar crti command controls the 
system partition sensitive subsystems on the CWS and on the SP nodes. This 
command will start the daemons: hats, hags, haem, hr, pman, emon, 
spconfigd, emcond, and spdmd (optional). Since the daemons need to 
execute on all machines of the SP system for the subsystem to run 
successfully, syspar_crti -a must also be executed on each node when it is 
up. 

Question 7 - The answer is C. The customization of the boot/install server 
(setup_server command) creates several files in /tftpboot directory. Refer 
to 9.3.2, “/tftpboot” on page 284 for details. 

Question 8 - The answer is B. The slterm command is a very useful 
command to take control of a node when the IP connection through the 
Ethernet network is not available. Refer to 9.2.20, “slterm” on page 279 for 
details. 

Question 9 - The answer is B. The spistdata command displays 
configuration information stored in the SDR. This command executes in the 
CWS or any SP node when using the command line interface. 
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Question 10 - The answer is D. The setup server configures the machine 
where it is executed (CWS or SP node) as a boot/install server. This 
command has no argument. It executes on the CWS and any additional 
boot/install servers. Refer to 9.2.15, “Configure the CWS as Boot/Install 
Server” on page 277 for details. 


A.9 Verification commands and methods 

Answers to questions in 10.8, “Sample questions” on page 311, are as 
follows: 

Question 1 - The answer is D. The SDR test script checks the SDR and 
reports any errors found. It will contact the SDR daemon and will try to create 
and remove classes and attributes. If this test is successful, then the SDR 
directory structure and the daemons are set up correctly. Refer to 10.3.1.2, 
“Checking the SDR initialization: SDR_test” on page 298 for details. 

Question 2 - The answer is D. The spmon -d command will contact the frame 
supervisor card only if the -g flag is used. If this flag is not used, the spmon -d 
command will only report node information. Refer to 10.3.4.2, “Monitoring 
hardware activity: spmon -d” on page 303 for details. 

Question 3 - The answer is B. The hardmon daemon is not a 
partition-sensitive daemon. There is only one daemon running on the control 
workstation at any time even though there may be more than one partition 
configured. The daemon uses the RS-232 lines to contact the frame 
supervisor cards every five seconds by default. 

Question 4 - The answer is C. The worm is started by the rc.switch script, 
which is started at node boot time. 

Question 5 - The answer is B. The SYSMAN_test command is a very powerful 
test tool. It checks a large number of SP system management components. 
The command is executed on the CWS, but it does not restrict its checking to 
components of the CWS. If nodes are up and running, it will also perform 
several tests on them. 


A.10 Understanding additional SP-related products 

Answers to questions in 11.8, “Sample questions” on page 325, are as 
follows: 
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Question 1 - The answer is A. To run jobs on any machine in the 
LoadLeveler cluster, users need the same UID (the system ID number for a 
user) and the same GID (the system ID number for a group) for every 
machine in the cluster. If you do not have a user ID on a machine, your jobs 
will not run on that machine. Also, many commands, such as liq, will not 
work correctly if a user does not have an ID on the central manager machine. 

Question 2 - The answer is C. The High Availability Cluster Multiprocessing 
Control Workstation (HACWS) requires two control workstations to be 
physically connected to any frame. A Y-cable is used to connect the single 
connector on the frame supervisor card to each control workstation. 

Question 3 - The answer is D. If the primary CWS fails, the backup CWS can 
assume all functions with the following exceptions: Updating passwords (if SP 
User Management is in use), adding or changing SP users, changing 
Kerberos keys (the backup CWS is typically configured as a secondary 
authentication server), adding nodes to the system, and changing site 
environment information. 


A.11 Application-specific resources 

Answers to questions in 12.6, “Sample questions” on page 360, are as 
follows: 

Question 1 - The answer is B. The lsvsd command, when used with the -l 
flag, will list all the configured virtual shared disks on a node. To display all 
the virtual shared disks configured in all nodes, you may use the dsh 
command to run the lsvsd command on all nodes. 

Question 2 - The answer is D. In order to get the virtual shared disk working 
properly, you have to install the VSD software on all the nodes where you 
want VSD access (client and server), then you need to grant authorization to 
the Kerberos principal you will use to configure the virtual shared disks on the 
nodes. After you grant authorization, you may designate which node will be 
configured to access the virtual shared disks you define. After doing this, you 
can start creating the virtual shared disks. Remember that when you create 
virtual shared disks, you have to make them ready to become active. By 
default, a virtual shared disk is put into a stopped mode after it is created; so, 
you have to use the preparevsd command to put them into a suspended state 
that can be made active by using the resumevsd command afterwards. Refer to 
12.2.5, “Changing States of virtual shared disks” on page 338 for details. 

Question 3 - The answer is B. In GPFS, there is no concept of a GPFS 
server or client node. A GPFS node is whatever node that has the GPFS 
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code configured and up and running. GPFS nodes are always, at least, VSD 
client nodes, but they may also be VSD server nodes. 


Question 4 - The answer is A. The GPFS subsystem is a system resource 
controlled subsystem called mmfs. The name comes from the multimedia AIX 
product (video streamer) that was developed in San Jose, California. GPFS 
shares this common past; so, that is why the mmfs name for the multimedia 
file system. 

Question 5 - The answer is A. It is possible to change the configuration of 
GPFS for performance tuning purposes. The mmchconfig command is capable 
of changing the following attributes: pagepool, data Structure Dump, 
mallocsize, maxFiles To Cache, priority, and autoload. Refer to 12.4.3, 
“Managing GPFS” on page 355 for details. 

Question 6 - The answer is D. GPFS automatically stripes data across VSDs 
to increase performance and balance disk I/O. There are three possible 
striping algorithms that you can choose for GPFS to implement: Round 
Robin, balanced Random, and Random. A striping algorithm may be set 
when a GPFS FS is created or can be modified as a FS parameter later. 

Question 7 - The answer is B. GPFS requires RVSD even though your 
installation does not have twin-tailed disks or SSA loops for multi-host disk 
connection. 


A.12 Problem management tools 

Answers to questions in 13.8, “Sample questions” on page 383, are as 
follows: 

Question 1 - The answer is D. The iog_event script uses the AIX aiog 
command to write to a wraparound file. The size of the wraparound file is 
limited to 64 K. The aiog command must be used to read the file. Refer to the 
AIX aiog man page for more information on this command. 

Question 2 - The answer is D. Access to the problem management 
subsystem is controlled by the /etc/syscti .pman.aci configuration file. All 
users who want to use the problem management facility must have a valid 
Kerberos principal listed in this file before attempting to define monitors. 
Refer to 13.5.1, “Authorization” on page 371 for details. 

Question 3 - The answer is C. The haemqvar command is a new command in 
PSSP 3.1 that allows you to display information regarding resource variables. 
Before this command was created, the only way you could get information for 
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resource variables (such as syntax and usage information) was through the 
SP Perspectives graphical interface, in particular, through the Event 
Perspective. 

Question 4 - The answer is D. Trace facility is available through AIX. 
However, it comes in an optional fileset called bos.sysmgt.trace. You need to 
install this optional component if you want to activate the trace daemon and 
generate trace reports. 

Question 5 - The answer is B. All the PSSP log files are located in the 
/var/adm/spiogs directory. All the RSCT log files are located in the 
/var/ha/iog directory. Make sure you have enough free space for holding all 
the logged information. 

Question 6 - The answer is D. Event Management gathers information on 
system resources using Resource Monitors (RMs). Refer to 13.4, “Event 
Management” on page 367 for details. 

Question 7 - The answer is A. The Problem Management subsystem (PMAN) 
is a facility used for problem determination, problem notification, and problem 
solving. The PMAN subsystem consists of three components: pmand, 
pmanrmd, and sp_configd. 

Question 8 - The answer is A. The following steps are required to create a 
condition: Decide what you want to monitor, identify the resource variable, 
define the expression, and create the condition. Refer to 13.6.1, “Defining 
Conditions” on page 376 for details. 


A.13 RS/6000 SP software maintenance 

Answers to questions in 14.7, “Sample questions” on page 403, are as 
follows: 

Question 1 - The answer is B. The spsvrmgr command can be used to check 
the supervisor microcode levels on frames and nodes. The -Gflag has to be 
used in order to get all frame supervisor cards checked. 

Question 2 - The answer is A. Every time a new PTF is applied, the 
supervisor microcode on frame and node supervisor cards should be 
checked. 

Question 3 - The answer is B. Refer to 14.5.2, “Supported Migration Paths” 
on page 397 for details. 
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Question 4 - The answer is C. 


Question 5 - The answer is B. To restore an image of the CWS, do the 
following: Execute the normal procedure to restore any RS/6000 workstation, 
issue the /usr/ipp/ssp/bin/instaii_cw command, and verify your CWS. 


A.14 RS/6000 SP reconfiguration and update 

Answers to questions in 15.9, “Sample questions” on page 431, are as 
follows: 

Question 1 - The answers are C and D. When changes are made to IP 
addresses of adapters defined in the SDR, as is the case of the SP Switch 
adapter, the information should be updated into the SDR, and the node(s) 
affected should be customized. 

Question 2 - The answer is A. New tall frames, announced in 1998, have 
higher power requirements. You should confirm that your current installation 
can handle this higher power demand. 

Question 3 - The answer is D. If you set up the boot/install server, and it is 
acting as a gateway to the CWS, the ipforwarding must be enabled. To turn 
it on issue: /usr/sbin/no -o ipforwarding=l. 

Question 4 - The answer is D. There is only one partition in the SP system. 
Refer to 15.7, “Replacing to PCI-Bases 332 MHz SMP Node” on page 426 for 
details. 

Question 5 - The answer is B. If you need to update the microcode of the 
frame supervisor of frame 2, enter: spsvrmgr -g -u 2:0. 


A.15 Problem diagnosis 

Answers to questions in 16.12, “Sample questions” on page 473, are as 
follows: 

Question 1 - The answer is D. When you download the PSSP installation 
tape into the control workstation or a boot/install server, the image is named 
ssp.usr.2.4.0.0 (for PSSP 3.1, it is called ssp.usr.3.1.0.0), but the setup_server 
script expects to find a file image called pssp.installp located in the main 
directory for the version you are installing (in this case, it is 
/spdata/sysl/install/pssplpp/PSSP-2.4). If this file (pssp.installp) is not 
present in that directory, the setup_server script will fail with this error. 
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Question 2 - The answer is A. If for some reason the /etc/passwd file gets 
erased or emptied, as happened here, you will not be able to log on to this 
node until the file gets restored. To do that, you have start the node in 
maintenance mode and restore the /etc/passwd file before attempting to log 
on to that node again. Make sure you supper update the files if you keep a 
single copy of the /etc/passwd file for your system. 

Question 3 - The answer is True. Although the control workstation plays a 
key role in the RS/6000 SP, it is not essential for having the nodes up and 
running. The most critical factor on the control workstation dependency is the 
fact that the SDR is located there, and by default, the control workstation is 
also the authentication server. 

Question 4 - The answer is A. The supfilesrv daemon runs on all the file 
collection servers. If the daemon is not running, clients will prompt this error 
message when trying to contact the server. 

Question 5 - The answers are B and C. Most cases when the error message 
refers to authenticator decoding problems, they are related to either the time 
difference between the client and the server machine because a time stamp 
is used to encode and decode messages in Kerberos; so, if the time 
difference between the client and server is more than five minutes, Kerberos 
will fail with this error. The other common case is when the /etc/krb-srvtab file 
is corrupted or out-of-date. This will also cause Kerberos to fail. 

Question 6 - The answer is C. When installing PSSP, the instaiip command 
will check the .toe. This file is not generated automatically when you move 
files around in the directory. Always use the inutoc command to update the 
table of contents of a directory before using the instaiip command. 
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Appendix B. Using the additional material 


This redbook contains additional materials in a CD-ROM, and it is also 
available in the form of Web material. See the appropriate section below for 
instructions on using or downloading each type of material. 


B.1 Using the CD-ROM 

The CD-ROM that accompanies this redbook contains the following: 


File name 

start.htm 
Readme.asc 
Sg245348.gif 
exam directory 


Description 

Starts the practice examination 
Instructions file 
Certification Logo 
List of exam gifs and htm files 


B.1.1 System requirements for using the CD-ROM 

The following system configuration is recommended for optimal use of the 
CD-ROM. 


Hard disk space: 
Browser: 

Other: 


5 MB minumum 
HTML browser 
CD-ROM drive 


B.1.2 How to use the CD-ROM or diskette 

You can access the contents of the CD-ROM by pointing your Web browser 
at the file start.htm in the CD-ROM root directory and following the links found 
there. Alternatively, you can create a subdirectory (folder) on your 
workstation and copy the contents of the CD-ROM into this folder. 


B.2 Locating the additional material on the Internet 

The CD-ROM and Web material associated with this redbook is also available 
in softcopy on the Internet from the IBM Redbooks Web server. Point your 
Web browser to: 

ftp://www.redbooks.ibm. com/redbooks/SG245348 

Alternatively, you can go to the IBM Redbooks Web site at: 

ibm.cam/ redbooks 
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Select the Additional materials and open the directory that corresponds with 
the redbook form number. 


B.3 Using the Web material 

The additional Web material that accompanies this redbook includes the 
following: 

File name Description 

sg245348.zip Practice exam material (using zip) 

B.3.1 System requirements for downloading the Web material 

The following system configuration is recommended for downloading the 
additional Web material. 

Hard disk space: 5 MB minimum 

Browser: HTML browser 

B.3.2 How to use the Web material 

Create a subdirectory (folder) on your workstation and copy the contents of 
the Web material into this folder. Proceed to unzip the file using a zip tool. 
After completing extracting all the files, you can access the contents by 
pointing your Web browser at the file start.htm in the root directory and 
following the links found there. 
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Appendix C. Special notices 


This publication is intended to help IBM Customers, Business Partners, IBM 
System Engineers, and other RS/6000 SP specialists who are involved in 
Parallel System Support Programs (PSSP) projects including the education of 
RS/6000 SP professionals responsible for installing, configuring, and 
administering PSSP. The information in this publication is not intended as the 
specification of any programming interfaces that are provided by Parallel 
System Support Programs. See the PUBLICATIONS section of the IBM 
Programming Announcement for PSSP for more information about what 
publications are considered to be product documentation. 

References in this publication to IBM products, programs or services do not 
imply that IBM intends to make these available in all countries in which IBM 
operates. Any reference to an IBM product, program, or service is not 
intended to state or imply that only IBM's product, program, or service may be 
used. Any functionally equivalent program that does not infringe any of IBM's 
intellectual property rights may be used instead of the IBM product, program 
or service. 

Information in this book was developed in conjunction with use of the 
equipment specified, and is limited in application to those specific hardware 
and software products and levels. 

IBM may have patents or pending patent applications covering subject matter 
in this document. The furnishing of this document does not give you any 
license to these patents. You can send license inquiries, in writing, to the IBM 
Director of Licensing, IBM Corporation, North Castle Drive, Armonk, NY 
10504-1785. 

Licensees of this program who wish to have information about it for the 
purpose of enabling: (i) the exchange of information between independently 
created programs and other programs (including this one) and (ii) the mutual 
use of the information which has been exchanged, should contact IBM 
Corporation, Dept. 600A, Mail Drop 1329, Somers, NY 10589 USA. 

Such information may be available, subject to appropriate terms and 
conditions, including in some cases, payment of a fee. 

The information contained in this document has not been submitted to any 
formal IBM test and is distributed AS IS. The use of this information or the 
implementation of any of these techniques is a customer responsibility and 
depends on the customer's ability to evaluate and integrate them into the 
customer's operational environment. While each item may have been 


© Copyright IBM Corp. 2000 


497 



reviewed by IBM for accuracy in a specific situation, there is no guarantee 
that the same or similar results will be obtained elsewhere. Customers 
attempting to adapt these techniques to their own environments do so at their 
own risk. 

Any pointers in this publication to external Web sites are provided for 
convenience only and do not in any manner serve as an endorsement of 
these Web sites. 


This document contains examples of data and reports used in daily business 
operations. To illustrate them as completely as possible, the examples 
contain the names of individuals, companies, brands, and products. All of 
these names are fictitious and any similarity to the names and addresses 
used by an actual business enterprise is entirely coincidental. 

Reference to PTF numbers that have not been released through the normal 
distribution process does not imply general availability. The purpose of 
including these reference numbers is to alert IBM customers to specific 
information relative to the implementation of the PTF when it becomes 
available to each customer according to the normal IBM PTF distribution 
process. 


The following terms are trademarks of the International Business Machines 
Corporation in the United States and/or other countries: 


IBM 

Nways 

POWERparallel 

PS/2 

S/390 

SP 

TURBOWAYS 


MVS/ESA 

PAL 

PowerPC 604 
RS/6000 

Scalable POWERparallel Systems 
System/390 

Versitile Storage Server 


The following terms are trademarks of other companies: 


C-bus is a trademark of Corollary, Inc. in the United States and/or other 
countries. 


Java and all Java-based trademarks and logos are trademarks or registered 
trademarks of Sun Microsystems, Inc. in the United States and/or other 
countries. 


Microsoft, Windows, Windows NT, and the Windows logo are trademarks of 
Microsoft Corporation in the United States and/or other countries. 
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PC Direct is a trademark of Ziff Communications Company in the United 
States and/or other countries and is used by IBM Corporation under license. 

ActionMedia, LANDesk, MMX, Pentium and ProShare are trademarks of Intel 
Corporation in the United States and/or other countries. 

UNIX is a registered trademark in the United States and other countries 
licensed exclusively through The Open Group. 

SET, SET Secure Electronic Transaction, and the SET Logo are trademarks 
owned by SET Secure Electronic Transaction LLC. 

Other company, product, and service names may be trademarks or service 
marks of others. 
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Appendix D. Related publications 


The publications listed in this section are considered particularly suitable for a 
more detailed discussion of the topics covered in this redbook. 


D.1 IBM Redbooks 

For information on ordering these publications see “How to get IBM 
Redbooks” on page 505. 

• GPFS: A Parallel File System, SG24-5165 

• IBM 9077 SP Switch Router: Get Connected to the SP Switch, SG24-5157 

• Inside the RS/6000 SP, SG24-5145 

• PSSP 3.1 Announcement, SG24-5332 

• PSSP Version 3 Survival Guide, SG24-5344 

• RS/6000 SP Software Maintenance, SG24-5160 

• RS/6000 SP System Management: Power Recipes for PSSP 3.1, 
SG24-5628 

• RS/6000 SP Systems Handbook, SG24-5596 

• SP Perspectives: A New View of Your SP, SG24-5180 

• The RS/6000 SP Inside out, SG24-5374 

• Understanding and Using the SP Switch, SG24-5161 


D.2 IBM Redbooks collections 


Redbooks are also available on the following CD-ROMs. Click the CD-ROMs 
button at ihm.com/redbooks for information about all the CD-ROMs offered, 
updates and formats. 


CD-ROM Title Collection Kit 

Number 

IBM System/390 Redbooks Collection SK2T-2177 

IBM Networking Redbooks Collection SK2T-6022 

IBM Transaction Processing and Data Management Redbooks Collection SK2T-8038 
IBM Lotus Redbooks Collection SK2T-8039 

Tivoli Redbooks Collection SK2T-8044 

IBM AS/400 Redbooks Collection SK2T-2849 

IBM Netfinity Hardware and Software Redbooks Collection SK2T-8046 

IBM RS/6000 Redbooks Collection SK2T-8043 

IBM Application Development Redbooks Collection SK2T-8037 
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CD-ROM Title 


Collection Kit 
Number 

SK3T-3694 


IBM Enterprise Storage and Systems Management Solutions 


D.3 Other resources 

These publications are also relevant as further information sources: 

• AIX 4.3 Network Installation Management Guide and Reference, 

SC23-4113 

• AIX Problem Solving GUide and Reference, SC23-4123 

• AIX V4.3 Messages Guide and Reference, SC23-4129 

• AIX Version 4.3 Commands Reference, Volume 5, SC23-4119 

• AIX Version 4.3 System Management Guide: Communications and 
Networks, SC23-4127 

• General Parallel File System for AIX: Installation and Administration 
Guide, SA22-7278 

• IBM Parallel System Support Programs for AIX: Diagnosis Guide, 
GA22-7350 

• IBM Parallel System Support Programs for AIX: Managing Shared Disks, 
SA22-7349 

• IBM RS/6000 SP Planning Volume 1, Hardware and Physical 
Environment, GA22-7280 

• IBM RS/6000 SP Planning Volume 2, Control Workstation and Software 
Environment, GA22-7281 

• PSSP: Administration Guide, GC23-3897 

• PSSP: Administration Guide, SA22-7348 

• PSSP: Command and Technical Reference, GC23-3900 

• PSSP: Command and Technical Reference, Volume 1 and Volume 2, 
SA22-7351 

• PSSP: Diagnosis and Messages, GC23-3899 

• PSSP: Installation and Migration Guide, GA22-7347 

• PSSP: Installation and Migration Guide, GC23-3898 

• Site and Hardware Planning Information, SA38-0508 
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D.4 Referenced Web Sites 


• http://dscrs6k.aix.dfw.ibm.com 

• http://www.ibm.com/certify 

• http://www.storage.ibm.com 
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How to get IBM Redbooks 


This section explains how both customers and IBM employees can find out about IBM Redbooks, 
redpieces, and CD-ROMs. A form for ordering books and CD-ROMs by fax or e-mail is also provided. 

• Redbooks Web Site ihm.com/redbooks 

Search for, view, download, or order hardcopy/CD-ROM Redbooks from the Redbooks Web site. 
Also read redpieces and download additional materials (code samples or diskette/CD-ROM images) 
from this Redbooks site. 

Redpieces are Redbooks in progress; not all Redbooks become redpieces and sometimes just a few 
chapters will be published this way. The intent is to get the information out much quicker than the 
formal publishing process allows. 

• E-mail Orders 


Send orders by e-mail including information from the IBM Redbooks fax order form to: 


In United States or Canada 
Outside North America 


• Telephone Orders 

United States (toll free) 
Canada (toll free) 
Outside North America 


• Fax Orders 


e-mail address 

pubscan@us.ibm.com 

Contact information is in the “How to Order” section at this site: 

http://www.elink. ibmlink.ibm.com/pbl/pbl 


1-800-879-2755 

1-800-IBM-4YOU 

Country coordinator phone number is in the “How to Order” 
section at this site: 

http://www.elink.ibmlink.ibm.com/pbl/pbl 


United States (toll free) 
Canada 

Outside North America 


1-800-445-9269 

1-403-267-4455 

Fax phone number is in the “How to Order” section at this site: 

http://www.elink.ibmlink.ibm.com/pbl/pbl 


This information was current at the time of publication, but is continually subject to change. The latest 
information may be found at the Redbooks Web site. 


IBM Intranet for Employees - 

IBM employees may register for information on workshops, residencies, and Redbooks by accessing 
the IBM Intranet Web site at http://w3.itso.ibm.com/and clicking the ITSO Mailing List button. 
Look in the Materials repository for workshops, presentations, papers, and Web pages developed 
and written by the ITSO technical professionals; click the Additional Materials button. Employees may 
access MyNews at http://w3.ibm.com/ for redbook, residency, and workshop announcements. 
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IBM Redbooks fax order form 

Please send me the following: 

Title Order Number Quantity 


First name 

Last name 


Company 

Address 

City 

Postal code 

Country 

Telephone number 

Telefax number 

VAT number 


□ Invoice to customer number 

□ Credit card number 


Credit card expiration date Card issued to Signature 

We accept American Express, Diners, Eurocard, Master Card, and Visa. Payment by credit card not 
available in all countries. Signature mandatory for credit card payment. 
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Abbreviations and acronyms 


ACL 

Access Control Lists 

ADSM 

ADSTAR Distributed 
Storage Manager 

AFS 

Andrew File System 

AIX 

Advanced Interactive 
Executive 

AMG 

Adapter Membership 
Group 

ANS 

Abstract Notation 

Syntax 

API 

Application 

Programming Interface 

ARP 

Address Resolution 
Protocol 

BIS 

Boot/lnstall Server 

BOS 

Basic Overseer Server 

BSD 

Berkeley Software 
Distribution 

BUMP 

Bring-Up 

Microprocessor 

CDS 

Cell Directory Service 

CEC 

Central Electronics 
Complex 

CLIO/S 

Client Input Output 
Socket 

CP 

Crown Prince 

CPU 

Central Processing Unit 

CSMA/CD 

Carrier Sense, Multiple 
Access/Collision Detect 

CSS 

Communication 

Subsystem 

CWS 

Control Workstation 

DB 

Database 

DCE 

Distributed Computing 
Environment 

DFS 

Distributed File System 


DMA 

Direct Memory Access 

DNS 

Domain Name Service 

EM 

Event Management 

EM API 

Event Management 
Application 

Programming Interface 

EMCDB 

Event Management 
Configuration Database 

EMD 

Event Manager 

Daemon 

EPROM 

Erasable 

Programmable 
Read-Only Memory 

ERP 

Enterprise Resource 
Planning 

FCS 

Fiber Channel 

Standard 

FDDI 

Fiber Distributed Data 
Interface 

FIFO 

First-In First-Out 

FLDB 

Fileset Location 

Database 

FS 

File System 

GB 

Gigabytes 

GL 

Group Leader 

GPFS 

General Purposes File 
System 

GS 

Group Services 

GSAPI 

Group Services 
Application 

Programming Interface 

GUI 

Graphical Interface 

GVG 

Global Volume Group 

HACMP 

High Availability Cluster 
Multiprocessing 
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HACMP/ES 

High Availability Cluster 
Multiprocessing 
Enhanced Scalability 

HACWS 

High Availability 

Control Workstation 

HB 

Heart Beat 

HIPPI 

High Performance 
Parallel Interface 

HIPS 

High Performance 
Switch 

HRD 

Host Respond Daemon 

HSD 

Hashed Shared Disk 

HSSI 

High Speed Serial 
Interface 

IBM 

International Business 
Machines Corporation 

IP 

Internet Protocol 

ISB 

Intermediate Switch 
Board 

ISC 

Intermediate Switch 
Chip 

ITSO 

International Technical 
Support Organization 

JFS 

Journaled File System 

LAN 

Local Area Network 

LCD 

Liquid Crystal Display 

LED 

Light Emitter Diode 

LFS 

Local File System 

LP 

Logical Partition 

LRU 

Last Recently Used 

LSC 

Link Switch Chip 

LV 

Logical Volume 

LVM 

Logical Volume 
Manager 

MAC 

Media Access Control 

MACN 

Monitor and Control 
Nodes 


MB 

Megabytes 

MCA 

Micro Channel 
Architecture 

Ml 

Manufacturing Interface 

MIB 

Management 

Information Base 

MIMD 

Multiple Instruction 
Stream, Multiple Data 
Stream 

MPI 

Message Passing 
Interface 

MPL 

Message Passing 
Library 

MPP 

Massive Parallel 
Processors 

NFS 

Network File System 

NIM 

Network Installation 
Management 

NIS 

Network Information 
System 

NSB 

Node Switch Board 

NSC 

Node Switch Chip 

NVRAM 

Non-volatile Memory 

OID 

Object ID 

ODM 

Object Data 
Management 

OLTP 

Online Transaction 
Processing 

OSF 

Open Software 
Foundation 

P2SC 

POWER2 Super Chip 

PA IDE 

Performance Aide for 

AIX 

PE 

Parallel Environment 

PID 

Process ID 

PIOFS 

Parallel I/O File System 

PM AN 

Problem Management 

PP 

Physical Partition 
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PSSP 

Parallel System 

SRC 

System Resource 


Support Programs 


Controller 

PTC 

Prepare to Commit 

SSA 

Serial Storage 

PTPE 

Performance Toolbox 


Architecture 


Parallel Extensions 

SUP 

Software Update 

PTX 

Performance Toolbox 


Protocol 


for AIX 

TGT 

Ticket-Granting Ticket 

PV 

Physical Volume 

TLC 

Tape Library 

RAM 

Random Access 


Connection 


Memory 

TP 

Twisted Pair 

RCP 

Remote Copy Protocol 

TS 

Topology Services 

RM 

Resource Monitor 

UTP 

Unshielded Twisted 

RMAPI 

Resource Monitor 


Pair 


Application 

VLDB 

Volume Location 


Programming Interface 


Database 

RPC 

Remote Procedure 

VSD 

Virtual Shared Disk 


Calls 

VSS 

Versatile Storage Server 

RPQ 

Request for Product 
Quotation 



RSCT 

RS/6000 Cluster 
Technology 



RVSD 

Recoverable Virtual 

Shared Disk 



SAMI 

Service and 

Manufacturing Interface 



SBS 

Structured Byte Strings 



SCSI 

Small Computer 

Systems Interface 



SDR 

System Data 

Repository 



SMP 

Symmetric 

Multiprocessor 



SNMP 

Simple Network 
Management Protocol 



SPMI 

System Performance 
Measurement Interface 



SPOT 

Sequence Power Off 

Timer 



SPUM 

SP User Management 
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Index 


Symbols 

/etc/rc.sp 365 
/unix 366 

/usr/include/sys/trchkid.h 365 
/var/adm/ras 365 
/var/adm/SPIogs 366 
/var/adm/SPIogs/SPdaemon.log 372 

Numerics 

100BASE-TX 88,89,97,99 
10BASE-2 88 
10BASE-T 88 
332 MHz SMP node 405 
8274 98 

A 

abbreviations 507 
Access control 220 
Access Control Lists 202 
ACL files 197 
acronyms 507 
adapters 

Ethernet 271,274 
FDDI 274 
switch 274 
Token Ring 274 
Adding a frame 405, 406 
Adding a Switch 425 
AFS 189 

adduser 212 
chown 212 
creategroup 212 
delete 212 
examine 212 
kas 212 
kinit 212 
klog.krb 212 
listowned 212 
membership 212 
pts 212 

removeusers 212 
setfields 212 
token.krb 212 
AIX 

filesets 258 


Images installation 290 
Ipp installation 291 
SRC 202 
AIX error log 364 
Amd 

See Berkeley automounter 
apply the PTFs 389 
ARP cache 97 
authjnstall 188 
auth_methods 188 
auth_root_rcmd 188 
Authentication methods 188 
Authorization 203 
AutoFS 450 
Automounter 

/etc/amd/amd-maps/amd.u 243 
AIX Automounter 219 
migration 242 
mkautomap 242 
autosensing 99 

B 

backup 257 
backup images 389 
Berkley automounter 219 
BNC 88 

boot/install server 29, 92 
configuring 279 
selecting 279 
bootlist 124 
bootp 285 
bootp_response 401 
bos.rte 363 

bos.sysmgt.serv_aid 363 
bos.sysmgt.trace 364 
bosinst.data 131 
broadcast storm 96 
BUMP 461 

c 

Central Electronics Complex (CEC) 146 
Central Manager 
see LoadLeveler 
Coexistence 427 
Commands 

/var/sysman/super update 234 
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/var/sysman/supper 227 
adcLprincipal 197 
arp 305 
cap 199 

change_admin_password 198 
change_password 198 
chauthent 188 
chauthpar 188,276 
chkp 199 
cpw 198 

create_krb_files 288 
CSS_test 300,306,422 
dsh 190 

Eannotator 280,418 
Eclock 418 
Eprimary 281 
Estart 285,422 
Etopology 280 
exportfs 415 
files 236 
ftp 187 

haemqvar 374, 377 
hmadm 204 
hmcmds 202, 425 
hmmon 202 
hmreinit 425 
install 236 
install_cw 253, 392 
inutoc 394 
k4init 195, 203 
k4list 203,212 
kSdcelogin 209 
kas 212 
kdb_util 195 
dump 200 
load 200 
kinit 212 
klist 212 
klog.krb 212 
kpasswd 195,198,212 
ksrvutil 

change 200 
list 200 
kstash 195 
log 236 
Ippdiff 299 
Isauthent 188,208 
Isauthpar 188 
Islpp 298 


Issrc 301 
mkautomap 242 
mkconfig 288 
mkinstall 288 
mkkp 197 
mksysb 256 
mmconfig 346 
netstat 305 
nodecond 202, 281 
perspectives. 306 
ping 305 

pmanrmloadSDR 373 
pts 212 
rcmdtgt 210 
rep 190,204 
rexec 187 
rlogin 305 

rsh 189,190,204,206 
slterm 202,281,419 
savevg 256 
scan 235 
SDR_test 298, 306 
SDRArchive 410 
SDRGetObjects 304 
serve 236 

setup_authent 195, 200, 211,252 

setup_server 200,279,288,415 

smit mkclient 226 

smit mkmaster 225 

smit mkslave 226 

smit site_env_dialog 220 

smit spmkuser 222 

smit sprmuser 223 

spacs_cntrl 224 

spadaptrs 274,414 

spbootins 277, 416 

spchvgobj 277,416 

spethernt 271,410,411 

spframe 269, 407 

sphardware 202 

sphostnam 275,415 

sphrdward 411 

sphrdwrad 274 

spied 304 

splst_syspar 300 

splst_versions 299 

splstdata 271,305, 412 

spluser 223 

spmkuser 222 
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spmon 190, 202, 303, 304, 306 
spmon -d 409 
spmon -d -G 422 
spmon_ctest 298, 306 
spmon_itest 298, 306 
spsetauth 275 
spsitenv 222, 268 
spsvrmgr 270,410 
spverify_config 300, 306 
supper 228 
diskinfo 236 
files 236 
install 236 
log 236 
rlog 236 
scan 236 
serve 236 
status 236 
update 236 
where 236 
sysctl 190 
sysdumpdev 365 
SYSMAN_test 299,306,419 
syspar_ctrl 276, 302 
telnet 187,305 
token.krb 212 
traceroute 305 
unlog 212 
update 236 
when 236 
Configuration 267 
connectivity 255 
connwhere 117 
console 281 
control workstation 29 
CSMA/CD 88 
Customizing 

manually 287 
CWS 

See control workstation 

D 

Daemons 

automount 242 
automountd 242 
css.summlog 308 
cssadm 308 

fault_service_Worm_RTG_SP, 308 


haemd 308 
hagsd 308 
hagsglsmd 308 
hardmon 309 
hatsd 308 
hmrmd 204 
hrd 308 

Job Switch Resource Table Services 308 

kadmind 192, 308 

kerberos 191,308 

kpropd 192,308 

krshd 207, 208 

pmand 308, 371 

pmanrmd 308, 371 

rshd 207, 208, 372 

sdrd 308 

sp_configd 308, 371 
splogd 204, 308 
spmgrd 308 
supfilesrv 308 
supman 228 
sysctld 214,308 
Worm 308, 309 
xntpd 308 
ypbind 80, 226 
yppasswd 80 
ypserv 80, 226 
ypupdated 80 
Data Management 

File Collections 219 
NIS 219 
Diagnosing 

604 High Node 460 
File Collection 452 
Kerberos 454 
Network Boot Process 438 
SDR Problems 446 
setup_server 433 
Switch 462 

System Connectivity 459 
User Access 447 
Diagnosis 433 
Directories 

/share/power/system/3.2 227 
/spdata/sysl/install/images 390 
predefined 256 
disk 

space allocation 256 
DNS 86 
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DOMAIN 79 

dynamic port allocation 140 

E 

endpoint map 140 
Enter 269 

Enterprise Server 145 
environment 268 
Error Conditions 461 
Ethernet 254, 255 
Ethernet switch 88, 94 
Event Management 309 
client 368 
haemd 368 

Resource Monitor Application Programming In¬ 
terface 368 
Event Manager 173 

F 

Fast Ethernet 97, 98 
File Collections 

/share/power/system/3.2/.profile 229 
/var/sysman/file.collections 228 
/var/sysman/sup 231 
/var/sysman/sup/lists 228 
/var/sysman/super update 234 
Available 227 
diskinfo 236 
hierarchical 231 
Master Files 228 
node.root 230, 231 
power_system 230, 231 
predefined file collections 230 
Primary file collections 229 
Resident 227 
rlog 236 
scan 235, 236 
Secondary file collection 229 
secondary file collection 231 
Software Update Protocol 227 
status 236 
SUP 227 
sup.admin 230 
supper 228 
user.admin 230 
when 236 
where 236 
Files 


$HOME/.k5login 209 
$HOME/.netrc 187 
$HOME/.rhosts 187,208 
■configjnfo 288 
.installjnfo 288 
.profile 253 
/. k 193 

/etc/amd/amd-maps/amd.u 243 

/etc/environment 253 

/etc/ethers 225 

/etc/group 225 

/etc/hosts 225 

/etc/hosts.equiv 187,208 

/etc/inetd.conf 208, 254 

/etc/inittab 253, 254 

/etc/krb.conf 193 

/etc/krb.realms 194 

/etc/krb-srvtab 193,203,210 

/etc/netgroup 225 

/etc/networks 225 

/etc/passwd 225 

/etc/profile 253 

/etc/protocols 225 

/etc/publickey 225 

/etc/rc.net 254 

/etc/rpc 226 

/etc/security/group 226 

/etc/security/passwd 226 

/etc/services 226, 255 

/etc/sysctl.acl 214 

/etc/sysctl.conf 214 

/spdata/sys1/install//lppsource 291 

/spdata/sysl/install/images 290 

/spdata/sysl/install/pssp 292 

/spdata/sysl/install/psspIpp/PSSP-x.x 291 

/spdata/sysl/spmon/hmacls 203 

/tmp/tkt 193 

/tmp/tkt_hmrmd 204 

/tmp/tkt_splogd 204 

/usr/lpp/ssp/bin/spmkuser.default 222 

/var/adm/SPIogs/kerberos/kerboros.log 194 

/var/kerberos/database/slavesave 200 

<hostname>-new-srvtab 288 

bosinst_data 292 

CSS_test.log 422 

firstboot.cust 290 

image.data 292 

pmandefaults 372 

pssp_script 292 
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script.cust 226, 289 
SDR_dest_info 434 
SPdaemon.log 372 
trchkid.h 365 
tuning.commercial 417 
tuning.cust 289,417 
tuning.default 417 
tuning.development 417 
tuning.scientific. 417 
frame 8, 269 
model frame 9 
short expansion frame 10 
short model frame 10 
SP Switch frame 10 
tall expansion frame 9 
tall model frame 9 
frame to frame 152 

G 

get_auth_method 188,206,208 
Global file systems 132 
Graphical User Interface 306 
GRF 26 

Group Services 309 

H 

haemqvar 374, 377 
Half duplex 88 
hardmon 203 
hardmon principal 202 
hardware address 274 
Hardware Perspectives 307 
hd6 365 
hd7 365 
HDX 

See Half Duplex 

High Availability Control Workstation 32, 321 
High Performance Gateway Node 26 
High Performance switch (HiPS) 153 
home directories 132 
hooks 365 

host impersonation 187 
Hostname 78 
initial 275 


I 

I/O rack 146 


IBM.PSSP.pm.User_state1 373 
impersonation 187 
Install Ethernet, 92 
Installation 267 
Intermediate Switch Board 10 
ip_address 78 
ipforwarding 416 
ISB 

See Intermediate Switch Board 


J 

Job 

see LoadLeveler 


K 

K5MUTE 207 

kcmd 207, 208 

Kerberos 188,322 
/. k 193 
/tmp/tkt 193 
ACL files 197 
AFS 189 

authentication methods 276 

Authentication Server 191 

Authentication server 191 

authorization files 276 

File Collections 219 

hardmon 196 

Instance 190 

k4list 212 

kas 212 

kdestroy 212 

kinit 212 

klist 212 

klog.krb 212 

kpasswd 198,212 

port 207 

port (v4) 207 

ports 255 

Principal 190, 196, 197 
rcmd 196 
Realm 191 
server keys 200 
Service Keys 191 
Service Ticket 191 
sysct 213 
sysctl 189,213 
TGT 191 
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Ticket 191 

Ticket Cache File 191 
Ticket-Granting Ticket 191 
kshell port 207, 208 


kvalid_user 

209 

L 


LED 


LED 231 

439 

LED 260 

439 

LED 299 

439 

LED 600 

439 

LED 606 

439 

LED 607 

439 

LED 608 

439 

LED 609 

439 

LED 610 

439 

LED 611 

439 

LED 613 

439 

LED 622 

439 

LED 625 

439 

LED C06 

439 

LED CIO 

439 

LED C40 

440 

LED C42 

440 

LED C44 

440 

LED C45 

440 

LED C46 

440 

LED C48 

440 

LED C52 

440 

LED C54 

440 

LED C56 

440 


libc.a 207 
Iibspk4rcmd.a 207 
libvaliduser.a 209 


LoadLeveler 

central manager 317 
cluster 315 
job step 316 
scheduler 317 
SYSPRIO 318 
logs 310 

Ismksysb 393, 394 

M 

MAC address 274 

manual node conditioning 442 

Migration 397 


Mirroring 427 
mksysb 389 
Modification 397 

N 

naming conventions 257 
Network Boot Process 439 
Network Information System 
client 80 
maps 81 
Master Server 80 
Slave Server 80 
Network installation 96 
NFS 341 
nim_res_op 394 
NIS 

/etc/ethers 225 
/etc/group 225 
/etc/netgroup 225 
/etc/networks 225 
/etc/passwd 225 
/etc/protocols 225 
/etc/publickey 225 
/etc/rpc 226 
/etc/security/group 226 
/etc/security/passwd 226 
/etc/services 226 
clients 226 

master server 225, 226 
NIS client 226 
passwd 227 
script.cust 226 
slave 226 
slave server 226 
yppasswd 227 
node 

boot 281 

dependent node 25 
external node 22 
FHigh node 14 
installation 281 
Internal Nodes 14 
standard node 14 
Thin node 14 
Wide node 14 
Node conditioning 282 
Node Object 115 
Nways LAN RouteSwitch 98 
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p 

parity 108 
PATH 253 
Perspectives 

A New View of Your SP 307,311 
plain text passwords 187 
PMAN See Problem Management 
pmand 371 
pmanrmd 371 
pmanrmloadSDR 373 
Power Supplies 11 
POWER3 19 
PowerPC 17 
prerequisites 258 
Problem Management 371 
PMAN_LOCATION 374 
PMAN_RVFIELD0 374 
pmand daemon 371 
pmandefaults script 372 
pmanrmd daemon 371 
pmanrmloadSDR command 373 
Problems 

231 LED 441 
611 LED 442 
Accessing the Node 459 
Accessing User’s Directories 449 
Allocating the SPOT Resource 436 
AMD 448 

Authenticated Services 456 

C45 LED 443 

C48 LED 444 

Class Corrupted 447 

Connection to Server 446 

Decoding Authenticator 458 

Estart Failure 463 

Eunfence 467 

Fencing Primary nodes 467 

Kerberos Daemon 458 

Kerberos Database Corruption 456 

Logging 449 

Ippsource Resource 437 

mksysb Resource 437 

Network Commands 459 

NIM Cstate and SDR 435 

NIM Export 434 

Node Installation from mksysb 445 
Physical Power-off 461 
Pinging to SP Switch Adapter 466 
SDR 434 


Service’s Principal Identity 455 
SPOT Resource 437 
Topology-Related 459 
User Access or Automount 449 
User’s Principal Identity 455 
PROCLAIM messages 96 
protocol 79 
PSSP 

filesets 259 
Ipp installaiton 291 
Update 398 


R 

raw storage 108 
remd 207,210 
remd principal 210 
r-commands 204 
Reconfiguration 405 
Recoverable Virtual Shared Disk 
he 340 
rvsd 340 
Release 397 

remote execution commands 204 
Resource Monitors 367 
pmanrmd 371 
Resource Variables 

IBM.PSSP.pm.User_state1 373 
restore CWS or SP nodes 389 
RFC 1416 188 
RFC 1508 188 
RFC 1510 188 
RJ-45 88 

RMAPI, see also Resource Monitor Application Pro¬ 
gramming Interface in Event Management 368 
root.admin 203 
route add -net 78 
routing 91,92,93 


s 

S70 145 
S7A 145 
S80 145 
Script 

/usr/lpp/ssp/config/admin/cw_allowed 224 
/usr/lpp/ssp/config/admin/cw_restrict_login 224 
secret password 188 
Security 

ftp 187, 188 
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rep 187,189 
rexec 187 
rlogin 187 
rsh 187,189,206 
telnet 187, 188 
serial link 254, 255 

Service and Manufacturing Interface (SAMI) 152 
set_auth_method 188 
shared-nothing 49 
shell port 208 

Simple Network Management Protocol 371 
SMIT 

Additional Adapter Database Information 274 
Boot/Install Server Information 278 
Change Volume Group Information 277 
Get Hardware Ethernet Address 274 
Hostname Information 275 
List Database Information 306 
non-SP Frame Information 269 
RS/6000 SP Installation/Configuration Verifica¬ 
tion 306 

RS/6000 SP Supervisor Manager 270 
Run setup_server Command 279 
Select Authorization Methods for Root access to 
Remote Commands 276 
Set Primary/Primary Backup Node 281 
Site Environment Information 268 
SP Ethernet Information 271 
SP Frame Information 269 
Start Switch 285 
Store a Topology File 280 
Topology File Annotator 280 
smit hostname 76 
smit mktepip 76 

SNMP See Simple Network Management Protocol 

Software Maintenance 389 

SPLAN 88 

SP Log Files 366 

SP security 

Kerberos 189 
SP Switch frame 10 
SP Switch Router 26 
sp_configd 371 
spacs_cntrl 224 

SP-attached servers 22, 145, 408 
spbootins 120 
spbootlist 124 
spchvgobj 118,119 
spen 173 


SPCNhasMessage 173 
spdata 256 
spk4rsh 207, 208 
splstdata 123, 174 
spmirrorvg 121 
spmkvgobj 116 
spmon 173 
spot_aix432 394 
SPUM 

smit site_env_dialog 220 
spunmirrorvg 122 
sre 173 

SRChasMessage 173 

SSA disks 126 

subnet 91 

supervisor card 12 

supervisor microcode 270, 410 

Switch 

Operations 

clock setting 281 
primary node setting 281 
Start 285 

Topology setting 280 

sysctl 

/etc/sysctl.acl 214 
/etc/sysctl.conf 214 
Kerberos 213 
Tel 214 
SYSPRIO 

see LoadLeveler 
System Dump 365 
System Management 225 
File Collection 227 
NIS 224, 225 
SystemGuard 461 

T 

TB3MX 153 
TCP/IP 255 
Thin-wire Ethernet 88 
ticket cache 204 
ticket forwarding 207 
Topology Services 309 
Reliable Messaging 368 
TP 

See Twisted Pair 
trace facility 364 
tunables 254 
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Twisted Pair 88 


u 

u20 440 
UNIX 87 

UNIX domain sockets 368 
Unshielded Twisted Pair 88 
uplink 95 
User Management 

/usr/lpp/ssp/config/admin/cw_allowed 224 
SPUM 220, 221 

usr/lpp/ssp/config/admin/cw_restrict_login 224 
UTP 

See Unshielded Twisted Pair 


V 

Version 397 

Virtual Front Operator Panel 203 
volume group 277 
Volume_Group 114 

Volume_Group 114 
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